10,000 Matching Annotations
  1. Last 7 days
    1. eLife Assessment

      This study presents a useful database resource containing protein conformations generated through molecular dynamics simulations, with extensive quality evaluation and benchmarking. While the database is well-constructed and professionally organized, the evidence supporting its claimed representation of protein conformational landscapes is incomplete, as the short simulation times and starting structure bias prevent true Boltzmann sampling of the conformational space.

    2. Reviewer #1 (Public review):

      Summary:

      The authors describe a new database that rigorously explores protein conformations.

      Strengths:

      It is extremely well done, using state-of-the-art tools by a group at the top of the field of structural modeling. The evaluation of qualities and the benchmarking of the structures are outstanding, and it is expected that the new database will have a significant impact on the field.

      Weaknesses:

      The authors are using MD simulation to generate some of the structure, and therefore should have access to standard MD energies. I am surprised that no evaluation is provided based on these energies that can be extended to free energies.

    3. Reviewer #2 (Public review):

      Summary:

      The authors developed a dataset of protein conformations by running molecular dynamics simulations starting from both native and decoy conformations for a large number of proteins. These conformations were put together as a dataset for querying and downloading, along with their energies under different force fields. The authors suggest that such conformations represent the proteins' conformational landscape, so that they will be useful for evaluating methods generating multiple conformations of proteins.

      Strengths:

      The dataset is online and working. It has good documentation for others to use.

      Weaknesses:

      The biggest weakness is that the collected conformations very likely do not represent the true conformational landscape. To represent the conformational landscape, the structures need to be sampled based on the Boltzmann distribution. However, in this study, conformations are generated by running very short (125ps to 375ps) MD simulations starting from near-native conformations and decoys. Such short simulations will produce small fluctuations around the starting conformations, so the distribution of conformations is largely dominated by the distribution of the initial conformations, which by one means are Boltzmann distributed. A conformation might be physically plausible, but it might have very small weight in the Boltzmann distribution. On the other hand, conformations with large weights might not be in the dataset.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript describes a web-based tool that allows researchers to compare large numbers of representative ("plausible") conformations of proteins. It also includes energetic analysis from multiple widely used structure-prediction methods.

      Strengths:

      This tool will likely be useful for students who want to learn more about the ensemble properties of proteins. The resource is well organized and it represents a large amount of computing resources.

      Weaknesses:

      It is not entirely clear how the database may be utilized by other groups to advance research. It could be helpful if the authors add a short section that provides example use cases that illustrate how this database can support new strategies for studying protein dynamics.

    1. eLife Assessment

      This is an important study uncovering a new role of the SETD6-PPARγ axis in the regulation of hepatic lipid metabolism. The data convincingly demonstrate that methylation of PPARγ by SETD6 plays a key role in this process, linking lysine methylation to transcriptional control of lipid storage genes.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript from the Levy lab, the authors investigate whether SETD6 regulates hepatic lipid accumulation through direct methylation of PPARγ. They show that SETD6 binds and mono-methylates PPARγ at K170, and provide evidence that this modification enhances PPARγ occupancy at target promoters, promotes expression of lipid metabolism genes, as well as facilitates lipid droplet accumulation in HepG2 cells. The authors also find a positive feedback loop or circuit in which PPARγ activates SETD6 transcription in a methylation-dependent manner, thereby reinforcing this lipogenic program. Overall, the work presents a novel SETD6-PPARγ regulatory axis linking lysine methylation to transcriptional control of lipid storage genes, with possible relevance to NAFLD-associated biology.

      In all, I find this to be an important paper that describes and advances a new regulatory pathway that has significance to human health and disease. It would also be of interest to a broad audience. That said, there are also some concerns that the authors should address, as outlined below.

      Major concerns (pertains to rigor - highest priority)

      (1) Overall, the work presented is of high quality, and the data nicely support the conclusions; however, a few panels should be strengthened that have missing controls or information:<br /> a. The co-IP panel in Figure 1B lacks a lane where HA SETD6 is expressed without PPARγ. This control is needed to verify that the SEDT6-HA signal depends on PPARγ.<br /> b. In Figure 1C, the authors should show that the co-IP works in both directions (include IP for PPARγ/blot for SETD6). I am a bit confused also over the labeling with IP on the left and on top of the panel next to the beads label. More importantly, the data would be stronger if the authors took advantage of a deletion line to validate that the co-IP is specific to the presence of both.<br /> c. The same IP labeling issue exists for Figure 3B (label is on the same and on top).<br /> d. Antibody information (e.g., where the pan-methyl Ab comes from and at what dilutions they are used at) is missing.

      Nice to have experiments (medium priority - strongly consider)

      (2) A missing gap is how K170me1 contributes to DNA binding and gene transcription. One possibility is that methylation enhances the DNA-binding activity of PPARγ. Given that the authors have all of the reagents, it would be possible to perform a gel shift assay (or other approach) with and without SETD6-mediated methylation. Is DNA binding affected/enhanced?

      (3) Along these lines, I wonder if there is another possibility: could SETD6-mediated methylation of PPARγ drive SETD6-PPARγ interaction? In other words, in the K170R, is SETD6 still even associated with PPARγ, and this interaction is required for promoter recruitment? Alternatively, would a catalytic dead version of SETD6 fail to associate with PPARγ? Currently, no experiments test the impact of an unmethylatable version of PPARγ or a catalytic dead version of SETD6 on SETD6-PPARγ interaction or SETD6 recruitment to promoters.

      Minor concerns (text and figure display)

      (4) The text has multiple typos and grammatical errors, and there are some issues with the figure display.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors investigated the regulation of the transcription factor PPARγ by the post-translational modification lysine methylation. The data demonstrate that the lysine methyltransferase SETD6 targets PPARγ for methylation using biochemical and cell-based assays. Methylation of PPARγ occurs in its DNA binding domain, and the authors demonstrate that loss of methylation limits PPARγ chromatin binding, particularly to lipid storage and metabolism gene promoters. As a physiological output, the authors demonstrate that deletion of SETD6 and loss of PPARγ methylation also disrupt lipid droplet accumulation in hepatocytes. In addition, the authors uncover a positive feedback loop in which SETD6 methylation of PPARγ also regulates its binding to the SETD6 promoter and expression of the gene.

      Strengths:

      One of the key strengths of this manuscript is the novelty of the findings in terms of identifying a new mode of regulation of PPARγ that modulates its chromatin association in cells and thereby regulates lipid metabolism genes. The authors nicely combine biochemical studies of SETD6 activity with cell-based assays investigating PPARγ and SETD6 function in regulating lipid storage. Data supporting this conclusion is largely convincing, and frequently, multiple assays are used to provide sufficient support to the conclusions. This work therefore expands regulatory modes of PPARγ and identifies a new target for SETD6, an enzyme that targets a number of other transcription factors. Furthermore, the regulatory loop that controls SETD6 expression via PPARγ methylation is likely important for understanding SETD6 function in different cell types that have high levels of lipid accumulation or regulation. The gene expression and lipid accumulation assays are useful for testing the physiological outcome of loss of SETD6 activity or PPARγ methylation directly.

      Weaknesses:

      The data presented in the manuscript are largely convincing in support of the authors' conclusions; however, there are some errors in the presentation of the figures and some issues in the text that would benefit from editing. Furthermore, there are some important questions not fully addressed in the results or discussion. It would be great if the authors could speculate more on the diverse roles of SETD6 in methylated transcription factors and/or provide more context regarding the conditions that are likely to support methylation of PPARγ by SETD6. Also, while a potential cross-talk between methylation and phosphorylation is described in the discussion, it would be great to provide more structural insight into how this might regulate DNA binding of PPARγ and/or discuss whether there are other possibilities given the location of the target lysine in the DNA binding domain.

    1. eLife Assessment

      In this useful manuscript, Yang et al attempt to show that platelet recruitment to the liver via macrophages contributes to APAP-induced liver injury, but there were many areas where the data supporting the conclusions were incomplete. For example, the idea that platelets only affected KC glycolysis, but not the metabolism of other cells, to mediate the phenotype after injury is not adequately supported by the evidence. It is recommended to perform additional experiments to strengthen the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Yang et al expand on their previous work showing that platelet recruitment to the liver via liver macrophages is important for APAP-induced liver injury. Here, they show that platelets induce a glycolytic switch in liver non-parenchymal cells, including Kupffer cells, and that this is mediated by the protein Aldolase A produced by platelet-derived extracellular vesicles (PEV). They show that targeting Aldolase A may be a valid therapeutic strategy for severe APAP injury.

      Strengths:

      (1) They nicely showed that platelet effects in APAP are mediated by Aldoa via platelet-derived extracellular vesicles.

      (2) Their data show that one of the effects of platelets in APAP liver injury is inducing metabolic switch to the glycolytic pathway, including in KCs.

      (3) Their data points to the therapeutic potential of targeting ALDOA in severe APAP liver injury.

      Weaknesses:

      (1) They have not shown that the platelet-induced glycolytic switch is only in KCs.

      (2) They also have not shown that KC's role in APAP injury is primarily mediated by their interaction with platelets and the subsequent glycolytic switch.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors have investigated the role of platelet-derived ALDOA in liver injury induced acetaminophen (APAP) induced acute liver injury. There are some major flaws in data interpretation as described below. While a decrease in liver injury due to platelet depletion and lower injury in platelet-specific ALDOA KO mice seems real, the claims related to EVs and Platelet-KC crosstalk are not well supported.

      Strengths:

      Core findings are interesting and supported by the data

      Weaknesses:

      (1) At least two additional timepoints, one at 6 hr and another at 24 hr should be performed in the APAP model to better understand the dynamics of liver injury, especially after platelet depletion.

      (2) Interpretation of the experiments in Figure 2 with clodronate is flawed. 2-DG pretreatment and CLDN administration alone both seem to decrease liver injury substantially, so it is not surprising to see very little injury in the 2-DG+CLDN group.

      (3) Since both 2-DG and CLDN were administered pre-APAP, it is possible that they may interfere with APAP metabolism. This should be checked by looking at GSH depletion at 30 min post APAP treatment. The same question goes for S2 figure data.

      (4) There are no data on specific steps of APAP toxicity, such as GSH depletion, JNK activation, mitochondrial injury, etc., which are all well characterized in any of the studies. Rather, only injury endpoints are measured. It is critical to measure the mechanistic steps. This applies to all studies, but most importantly to the ALDOA-PF-KO mice in Figure 6.

      (5) Interpretation of data in Figure 5F is flawed. Since depletion of platelets also decreases liver injury along with the platelets, it can not be deduced that the decrease in ALDOA is only in platelets. Many other things are changing.

    4. Reviewer #3 (Public review):

      Summary:

      The authors address the possibility that platelet (PLT) derived EVs are important mediators of acute liver injury. Furthermore, KCs are important mediators of inflammation and are noted to need to undergo metabolic reprogramming to achieve their effects during injury. They use an APAP-induced liver injury model (AILI). They show that PLTs are recruited and that they interact with KCs in this model system. RNA-seq of KCs showed upregulation of glycolysis and gluconeogenesis. PLT depletion led to reduced liver injury. RNA-seq of KCs showed downregulation of glycolysis. In vitro co-culture of KCs and pets recapitulated the glycolysis findings. In vivo, 2DG inhibited liver injury, but not in the setting of KC depletion. They went on to show that PLT-derived EVs mediate this effect on KCs using a mix of in vitro and in vivo assays, although control EVs were lacking. After doing mass spec on EVs, they find that ALDOA is the critical payload of the PEVs that mediates the pro-glycolytic effect in vivo. They both delete ALDOA from PLTs, and they use an ALDOA inhibitor to show that injury in AILI requires ALDOA.

      Strengths:

      This is generally an interesting series of observations with an elegant mechanism. Many of the experiments are done in vivo with highly rigorous KO models. However, in many of the EV experiments, there are concerns about a lack of appropriate controls that might limit the rigor of those aspects of the study. 

      Weaknesses:

      (1) There is strong variability in the gene expression between mice in Figure 1B. I worry that the signals may not be statistically significant. The authors should assess the statistical significance.

      (2) In Figure 2B, the necrosis areas that are circled in the image do not seem to resemble the quantitation on the right. For example, I don't see 60% necrosis in the APAP PBS group. Also, I don't see 5-10% necrosis in the CLDN APAP group. More images that are clearer are needed, and circled necrosis areas should be shown.

      (3) In Figure 2D, a higher N should be shown. The number of mice (3) is different from the other experiments, so the exclusion of those mice should be explained.

      (4) In general, control EVs from a non-PLT source should be used for all EV-related experiments. EVs derived from AML12 hepatocytes would seem to be a reasonable control for some of the experiments. Otherwise, it is hard to know if this is a general EV effect or one that is specific to PLT-derived EVs. In Figure 3B, EVs from non-PLTs should be used as a control. Since it is possible that all EVs express some level of TSG101 or CD63. In addition, control EVs should be used to test effects on KC metabolism, since the claim is that the effects are specific to PLT-derived EVs. Similarly, Figure 4 needs some kind of EV control that is not from PLTs.

      (5) Figure 5B should include an EV control in the blot. Most of the blots need controls from AML12 EVs or from another in vivo source.

      (6) It is a little difficult to imagine how enough ALDOA protein could be transmitted from PEVs to influence KC glycolysis on the gene expression level. It is possible that ALDOA is required for PLT-induced activation of KCs, or that EVs from PLTs can induce a metabolic shift in KCs. However, it has not been definitively shown that ALDOA from PEVs is directly causing the KC activation. Ultimately, it would be good to obtain PEVs from ALDOA WT and KO mice, then provide these PEVs to AILI mice without PLTs to see if they have differential effects on the AILI model. This would really demonstrate that the ALDOA in the PEVs is mediating the glycolytic, injurious effect.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Kim and Parsons present a timely overview of the NTR/prodrug system and its applications in regenerative biology research, with particular emphasis on tissue-specific cell ablation. The system has substantially advanced the field by enabling non-invasive, conditional cell elimination, and has proven especially powerful in zebrafish, though applications in other classical model organisms are also noted. The review covers the historical origins of the NTR system, its use in regeneration studies, small molecule screening, and genetic and CRISPR-based screening, as well as future directions, including the development of the highly efficient NTR2 enzyme variant.

      Strengths:

      This is a useful and well-structured contribution. The manuscript is a valuable resource for the regeneration biology community.

      Weaknesses:

      The impact and scientific value of this paper could be meaningfully enhanced by addressing several points outlined below. The concerns centre on completeness, conceptual precision, and the depth of mechanistic discussion.

      (1) Title: Species specificity.

      Given that the review's primary focus is the zebrafish model, it would be appropriate to include the species name in the title. This would improve discoverability and accurately set the scope of the article for prospective readers.

      Thank you for this suggestion. In revising the review, we have substantially expanded the content to address the reviewers' comments, including adding more detail on the use of NTR in other species. We agree that the majority of published work, and the research we cover, has been conducted in zebrafish, and we have clarified this in the abstract and introduction. However, our aim in writing the review was also to highlight that there is no intrinsic barrier to adopting this technique more broadly in other systems. Notably, NTR was first developed in mice, but with a prodrug that proved difficult to use, and it was not widely pursued. In mouse models, the development of DTR offered an alternative, though that approach carries risks of kidney toxicity and is incompatible with chronic ablation due to immunogenicity. Given this context, we would prefer to retain a title that does not limit the scope exclusively to zebrafish, so as not to discourage readers working in other model systems who might benefit from considering the NTR system.

      (2) Subchapter: Physical injury.

      The subchapter enumerates different types of physical injury models but would benefit from a more substantive comparative discussion. In particular, the authors are encouraged to address the following:

      (2.1) Outcome comparison: Surgical and other invasive approaches cause damage to entire tissue structures comprising multiple cell types, whereas tissue-specific genetic ablation eliminates a defined cell population while leaving the surrounding architecture largely intact. This fundamental distinction has direct implications for the interpretation of regenerative outcomes and should be clearly articulated.

      We appreciate the reviewer raising these important points, as well as those noted in Section 2.2. We addressed the concerns from Sections 2.1 and 2.2 throughout multiple parts of our review, specifically in the following sections:

      • Physical injury – where we highlight the importance of precisely characterizing the nature and extent of tissue damage in order to appropriately interpret subsequent biological responses.

      • Chemogenetic cell-specific ablation – where we expand on this theme by discussing the advantages of selectively eliminating discrete cell populations and how this improves mechanistic interpretation of regeneration.

      • Development of NTR as a suicide gene – where we examine apoptotic pathways and their relevance to nitroreductase-mediated cell ablation.

      • NTR/prodrug systems in regenerative studies – where we compare what is currently known about immune activation and inflammatory responses across different NTR-based ablation paradigms.

      (2.2) Inflammatory response: Invasive injuries typically trigger a robust inflammatory response, which itself can be a potent driver of regeneration. By contrast, genetic cell ablation may elicit a qualitatively different inflammatory reaction. A comparative discussion of this distinction would help readers appreciate a critical limitation of genetic ablation systems relative to models of natural, accidental tissue damage.

      Please see above response 2.1

      (3) Subchapter: Cell-specific toxins.

      This subchapter would benefit from several targeted expansions:

      (3.1) Off-target effects: The authors should include evidence that the exemplified drugs have known off-target activities, with a discussion of how these confounded the interpretation of experimental data. At least a few concrete published examples should be cited.

      Thank you very much for the comments. We have strengthened the discussion of off-target effects by adding concrete published examples. We now note that MPTP/MPP⁺ can affect noradrenergic and serotonergic systems in addition to dopaminergic neurons, that aminoglycoside antibiotics can damage support cells and afferent neurons at higher concentrations with compound-specific differences in ototoxicity, and that streptozotocin exhibits hepatotoxicity beyond pancreatic β-cells.

      (3.2) Completeness of the toxin list: The current list appears illustrative rather than comprehensive. A more complete enumeration would be valuable, particularly for neurotoxins and drugs targeting sensory cells, as these are highly relevant to the zebrafish regeneration field.

      We have now consolidated the toxins discussed throughout the review into Table 1, which includes additional entries alongside the previously listed agents. We have explicitly noted that this list is representative rather than exhaustive, as the full range of cell-specific toxins used across species is extensive.

      (3.3) Interspecies differences: It would be informative to specify whether drug specificity differs across species, as this is a practical consideration for researchers working in organisms other than zebrafish.

      We appreciate the reviewer’s question regarding potential interspecies differences in prodrug performance. Early work using NTR in mammals was conducted in mice, and all five published mouse studies relied exclusively on CB1954. No other NTR-activating prodrugs have been reported in mouse models, so direct comparisons are not available. Likewise, all published Xenopus studies used MTZ and thus do not provide internal comparisons across prodrugs. The Nematostella study employed NFP (citing rationale from a zebrafish study) and the approach yielded effective ablation.

      The only non-zebrafish study that directly compared prodrugs is the Drosophila work, which evaluated MTZ, RNZ, and NFP and reported lower activity for MTZ relative to the other compounds. Because it is not clear whether the authors were aware of the batch variability of MTZ or the need for freshly prepared solutions, interpreting this specific comparison is difficult.

      To address the reviewer’s comment, we have expanded the section on non-zebrafish organisms to clearly state which prodrug was successfully used in each species. However, given the limited number of studies, the absence of titration experiments, and the lack of standardized conditions across laboratories, we do not feel that the available evidence supports drawing conclusions about interspecies differences in prodrug performance.

      Consistent with our original discussion and based on the broader biochemical and empirical data available, we continue to recommend RNZ as the starting point for new experiments.

      (4) Subchapter: Optogenetic cell ablation.

      The authors note that optogenetic cell ablation has not yet been applied in conventional regeneration studies. It would strengthen this section to include a discussion of the underlying reasons for this gap, whether technical or biological, so that readers can appreciate the barriers and potential for future adoption.

      We thank the reviewer for this helpful suggestion. As recommended, we have added a concise, explicitly speculative statement discussing potential technical factors that may explain why optogenetic cell ablation has not yet been widely applied in regeneration studies. Specifically, we note that KillerRed-based ablation requires localized light delivery and ROS generation, making it best suited for discrete, optically accessible cells and less practical for targeting large or deep tissues. We also highlight that the dependence on microscopy-based illumination inherently limits throughput. This new text clarifies possible barriers to broader adoption while acknowledging that these points remain speculative.

      (5) Terminology: "Suicide gene".

      The use of the term "suicide gene" to nitroreductase is conceptually imprecise and merits reconsideration. Strictly speaking, a suicide gene is one whose expression alone is sufficient to kill the cell, as in the case of genes encoding direct triggers of apoptosis or the catalytic A subunit of diphtheria toxin (DTA). NTR does not meet this criterion: it requires the exogenous administration of a prodrug (e.g., metronidazole) to produce a cytotoxic metabolite and is therefore only conditionally lethal.

      It is worth noting that nitroreductases evolved in bacteria and fungi as enzymes involved in chemoprotection and detoxification, converting potentially toxic and mutagenic nitroaromatic compounds into less harmful metabolites (PMID: 18355273). This biological context further underscores that NTR is not inherently a lethal protein. The authors are encouraged to replace or qualify the term "suicide gene" and instead adopt terminology that more accurately reflects the conditional, prodrug-dependent nature of the system.

      We appreciate the reviewer’s thoughtful attention to terminology. We agree that, in its most classical and stringent sense, a suicide gene is one whose expression alone is sufficient to induce cell death. We also recognize that NTR does not meet this strict criterion.

      At the same time, we note that the term has broadened in contemporary usage, particularly within applied and translational contexts, to encompass prodrug-dependent systems. For example, the National Cancer Institute Thesaurus defines a suicide gene as “a gene which will cause a cell to kill itself, typically through interaction with a prodrug,” and Taber’s Medical Dictionary likewise states that it is “a gene that causes a cell to kill itself, usually by encoding an enzyme that converts a nontoxic prodrug into a toxic metabolite.” Under these widely used definitions, NTR is included within the scope of suicide gene systems.

      Nevertheless, we appreciate that terminology in this area is not universally standardized. To ensure clarity for all readers, we have added a brief definition in the revised manuscript explicitly noting the conditional, prodrug-dependent nature of NTR-mediated ablation. We are grateful to the reviewer for prompting this clarification.

      (6) NTR/MTZ in regenerative studies: Mechanistic depth.

      While the review catalogues several studies employing the NTR/MTZ system, it lacks mechanistic depth regarding the cellular basis of ablation. The following questions should be addressed, where evidence exists in the literature:

      (6.1) Temporal dynamics of cell death: What is known about the kinetics of NTR/MTZ induced lethality across different tissue types in larval and adult zebrafish, as well as other organisms? Are there age- and tissue-specific differences in the speed or completeness of ablation?

      Thank you for this important question. We have added text noting that the kinetics and completeness of NTR/prodrug-mediated ablation vary across experimental contexts, including with differences in NTR expression, enzyme/prodrug pairing, dose, cell type, and developmental stage. Published studies illustrate that the time course of ablation can differ substantially between models. Because most studies were designed to optimize ablation within individual tissues rather than for direct side-by-side comparison, the literature does not yet support broad quantitative conclusions about age- or tissue-specific differences across systems.

      (6.2) Mechanism of cell death: What is the cellular basis of NTR/MTZ-induced cytotoxicity in zebrafish? In particular, do the toxic metabolites preferentially cause mitochondrial damage or nuclear DNA damage, and what downstream death pathways are engaged?

      Thank you for the comments. We have added text discussing the mechanism of NTR/MTZ-induced cell death. We now note that NTR-mediated reduction of MTZ generates reactive intermediates that cause DNA damage and oxidative stress, with cell death occurring predominantly through apoptosis. We have also more strongly emphasized that in dopaminergic neurons, mitochondrial damage was identified as the primary cytotoxic mechanism. We acknowledge that the relative contribution of these pathways is likely to vary by cell type and remains an important area for future study.

      (6.3) Proliferative versus post-mitotic cells: Are proliferating and non-proliferating cells equally sensitive to the NTR/MTZ system, or does the proliferative status of a cell influence susceptibility? This is a practically important question for researchers designing ablation experiments in tissues with mixed cell populations.

      We appreciate the reviewer’s insightful question. We have now added a brief clarification to this section explaining that the NTR/MTZ system has been shown to act in a cell-cycle–independent manner, and both proliferating and post-mitotic cells can be ablated effectively.

      (6.4) Ablation of progenitor cells: Are there published examples demonstrating that co-ablation of differentiated functional cells and organ-specific progenitor cells abolishes regenerative capacity? Such examples would be highly informative in illustrating the system's power to dissect the cellular requirements for regeneration.

      To our knowledge, the zebrafish lateral line currently provides the clearest example in which NTR-mediated ablation of progenitor populations results in a loss of regenerative capacity. In this system, targeted ablation of support-cell progenitors severely reduces hair-cell regeneration, illustrating how NTR enables direct testing of cellular requirements for tissue repair.

      Addressing the points above, particularly the comparative discussion of injury models and inflammatory responses, the clarification of terminology, and the mechanistic discussion of NTR/MTZ-induced cell death would substantially strengthen the review's scientific contribution and utility.

      Reviewer #2 (Public review):

      Summary:

      Kim and Parsons reviewed the nitroreductase (NTR)/prodrug system: when engineered cells expressing the enzyme NTR are treated with prodrug (e.g. metronidazole), NTR converts the prodrug into a cytotoxic compound that kills these cells. The review covers how the system has been developed, spatiotemporal control of targeted cell ablation, and its broad utility to study regenerative mechanisms, model human diseases, and screen chemicals to discover pro-regenerative and protective compounds. They further discussed the newer version of NTR, a more potent prodrug, and experimental design, which not only expands the possible utility of the NTR/prodrug system, but also allows the research community to develop a precise, reproducible and versatile platform.

      Strengths:

      The review summarized landmark work application of the NTR/prodrug system, and recent studies, with focus on the model organism zebrafish. The review provides a good gateway to understanding the system and considering regenerative studies.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #3 (Public review):

      Summary:

      This manuscript by Kim and Parsons presents an overview of the nitroreductase/metronidazole (NTR/MTZ) cell ablation system.

      Strengths:

      This manuscript nicely places the NTR/MTZ system in the context of other cell ablation methods, with a discussion of their respective advantages and disadvantages. This review is particularly useful for highlighting the many ways the NTR/MTZ system has been applied to study the regeneration of multiple cell types and to model different degenerative human diseases. The review concludes with a discussion on recent improvements made to the system and practical considerations and "best practices" for NTR-based experiments. This review could be a helpful resource, especially for researchers new to regeneration or cell ablation studies.

      Weaknesses:

      Although the NTR/MTZ system has been used in other model organisms, this review is primarily focused on its uses in zebrafish. While this is understandable given the wide adoption of NTR/MTZ in the zebrafish field, discussion of the unique considerations and/or challenges for non-zebrafish systems would be an interesting addition and could broaden the potential audience for this review. Additional minor revisions, as suggested below, could also improve readability.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Since the lab mouse is an important mammalian model system, with certain tissues harbouring some regenerative capabilities, including the peripheral nervous system (e.g., sciatic nerve regeneration after crush), and myelin, etc., it would be great if a section could be included to discuss the potential adoption of the NTR/prodrug system in future mouse studies.

      We appreciate the reviewer’s suggestion to discuss the potential future use of the NTR/prodrug system in mouse models. In surveying the literature, we identified only five mouse studies employing NTR, all of which used CB1954. These early studies were conducted primarily as proof-of-principle work in the context of gene-directed enzyme prodrug therapy (GDEPT) for cancer, rather than for regenerative or lineage-specific ablation applications. We added this point to the text.

      Since those reports, we have not found additional examples of NTR use in mice. We do not know the precise reasons for this limited adoption, but it may reflect the availability of alternative ablation systems that are widely established in mouse research, such as the diphtheria toxin receptor (DTR) system.

      We agree that certain mouse tissues exhibit regenerative capacity and that targeted ablation tools can be valuable in such contexts. To address the reviewer’s point, we have added text noting the very limited historical use of NTR/CB1954 in mouse. We have no explanation as to why no one moved onto using NTR/MTZ in the mouse but note in two places in the text that DTR is preferred method to use in mouse ablation experiments (even though DT does cause kidney damage and is incompatible with chronic studies!).

      Minor:

      (1) Line 174-176, the sentence was repeated.

      (2) Figure 1, for the transgenic line, please be consistent with the line name in italics.

      Reviewer #3 (Recommendations for the authors):

      (1) In the abstract as well as in the main text, the authors note that the NTR/MTZ system has been used in multiple model systems. Yet, most of the review, and especially the practical advice given at the end, is very zebrafish-focused. Although this is understandable given the wide adoption of NTR/MTZ in the zebrafish field, the authors might consider revising the abstract to make it clearer that this review is primarily concerned with the use of the NTR/MTZ system in zebrafish.

      Thanks for the suggestion. We have changed last half of first paragraph in abstract

      That said, a brief discussion of any unique considerations and/or challenges for non-zebrafish systems would be an interesting addition and could broaden the potential audience for this review.

      Agreed and we have expanded in several places in the text to discuss more about the NTR system in non-zebrafish. We especially expanded our discussion about NTR in the mouse.

      (2) Line 176: There is a repetition of the sentence, "NTR/MTZ-mediated ablation has also been adapted for other model organisms."

      Found and deleted. Thank you!

      (3) Line 177: To improve clarity, the authors should include species names to prevent confusion. For example, both Xenopus laevis and Xenopus tropicalis are commonly used model organisms. Similarly, multiple Drosophila species are used by researchers.

      Added melanogaster and laevis to text.

      (4) Can the authors address whether alternatives to MTZ (RNZ, etc.) have the same issues with batch-to-batch variability? That might be an important consideration for potential users. It would also be useful to include practical guidance for accounting for batch variability, for example, how to determine optimal prodrug concentrations, whether effective concentrations need to be determined for every batch/replicate/experiment, etc.

      Added text that discusses that, it is not yet known whether RNZ exhibits batch-to-batch variability similar to MTZ, as this has not been systematically reported. Given the potential for variability, it would be prudent for researchers to titrate each new batch of RNZ or, alternatively, adopt a dosing strategy that exceeds the minimum effective concentration to ensure consistent ablation results.

      (5) For the last section ("Experimental design: Practical and technical considerations"), readability would be improved by applying a consistent bullet point format.

      Made the changes as requested.

      (6) Figure 1: Asterisks are not defined.

      The asterisks where to link to two boxes depicting the same transgene without rewriting the name of the transgene. Clearly, this wasn’t clear, so we have added explanation to legend too.

      (7) Figure 3: Given that the schematics specify expression of NTR1 and NTR1.1, I assume this figure is adapted or based on previous published report(s). If so, the reference(s) should be noted in the figure legend or on the figure itself (as done for Figure 1). If the schematic is meant to depict only in general terms how binary expression vectors can be used, a more inclusive "NTR" label might be less confusing.

      Changed figure legend and figure

      (8) Figure 4: To improve readability and accessibility, the authors should consider modifying panels C-N to use a more colorblind-friendly palette (e.g., green/magenta) or to present each channel as separate grayscale images.

    2. eLife Assessment

      This Review Article nicely synthesizes the development, applications, and recent technical advances of the nitroreductase/prodrug system, highlighting how it enables precise spatiotemporal cell ablation and experimental platforms for studying regenerative mechanisms and screening for pro-regenerative or protective compounds. Together, the article provides a conceptual and practical overview that will help researchers adopt and further develop this versatile approach in regenerative biology. It will be of interest to researchers studying regeneration, disease modelling, and targeted cell ablation, particularly those working with zebrafish and other genetic model systems.

    3. Reviewer #1 (Public review):

      Summary:

      Kim and Parsons present a timely overview of the NTR/prodrug system and its applications in regenerative biology research, with particular emphasis on tissue-specific cell ablation. The system has substantially advanced the field by enabling non-invasive, conditional cell elimination, and has proven especially powerful in zebrafish, though applications in other classical model organisms are also noted. The review covers the historical origins of the NTR system, its use in regeneration studies, small-molecule screening, and genetic and CRISPR-based screening, as well as future directions including the development of the highly efficient NTR2 enzyme variant.

      Strengths:

      This is a useful and well-structured contribution. The manuscript is a valuable resource for the regeneration biology community.

      Weaknesses:

      The revised manuscript shows significant improvements; however, two points remain insufficiently addressed and should be resolved in the final version.

      (1) The term 'suicide gene'

      As noted in my first round of revisions, the term 'suicide gene' as applied to bacterial nitroreductase remains unaddressed in the revised manuscript, despite being scientifically inappropriate and a potential source of confusion regarding the NTR/Mtz mechanism.

      'Suicide' implies an intrinsic, cell-autonomous programme of self-destruction. This is incompatible with the NTR/Mtz system, in which cell death is experimentally induced through exogenous administration of metronidazole (Mtz) by the investigator. While the 'suicide gene' framing may have utility in the cancer therapy literature, likely to aid communication with non-specialist and clinical audiences, however, it is not standard in the zebrafish field, where NTR is more accurately described as a conditional toxigene. Since this review focuses predominantly on zebrafish models, its terminology should reflect that of the relevant literature.

      A further conceptual problem with the 'suicide gene' framing is that it obscures the pharmacological nature of Metronidazole. Mtz is a pharmaceutical agent with intrinsic baseline toxicity: extended exposure or modestly elevated concentrations cause toxic side effects and lethality even in non-transgenic (wild-type) zebrafish (PMID: 24428354). NTR-expressing cells do not self-destruct; rather, they are rendered selectively hypersensitive to Mtz relative to other eukaryotic cells by virtue of expressing the enzyme. This distinction is mechanistically important and should be reflected in the language used throughout the manuscript.

      In summary, the term 'suicide gene' does not accurately capture enzyme-mediated bioactivation of an exogenous prodrug and should be removed from the manuscript.

      (2) Barriers to using the NTR/Mtz system in non-aquatic model organisms

      In response to my suggestion that the title should include "zebrafish" to accurately convey the scope of the review to prospective readers, the authors stated that "there is no intrinsic barrier to adopting this technique more broadly in other systems," citing the example that "NTR was first developed in mice, but with a prodrug that proved difficult to use, and it was not widely pursued." These two statements are, however, contradictory: if the prodrug proved difficult to use, this constitutes precisely the kind of practical barrier the authors claim does not exist. The authors should clarify and reconcile this inconsistency, and provide a more thorough discussion of why the NTR/Mtz system has seen limited adoption in classical model organisms, such as mice and Drosophila.

    4. Reviewer #2 (Public review):

      Summary:

      Kim and Parsons reviewed the nitroreductase (NTR)/prodrug system: when engineered cells expressing the enzyme NTR are treated with prodrug (e.g. metronidazole), NTR converts the prodrug into cytotoxic compound which kill these cells. The review covers how the system has been developed, spatiotemporal control of targeted cell ablation, and its broad utility to study regenerative mechanisms, model human diseases, and screen chemicals to discover pro-regenerative and protective compounds. They further discussed the newer version of NTR, more potent prodrug, and experimental design, which not only expand the possible utility of the NTR/prodrug system, but allow the research community to develop a precise, reproducible and versatile platform.

      Strengths:

      The review summarized landmark work application of the NTR/prodrug system, and recent studies in model organisms, with focus on the model organism zebrafish. The review provides a good gateway to understanding the system and considering regenerative studies.

      Weaknesses:

      None.

      Comments on revisions:

      The authors have addressed the previous points, and the manuscript has been greatly improved.

    5. Reviewer #3 (Public review):

      Summary:

      This manuscript by Kim and Parsons presents an overview of the nitroreductase/metronidazole (NTR/MTZ) cell ablation system.

      Strengths:

      This manuscript nicely places the NTR/MTZ system in context of other cell ablation methods, with a discussion of their respective advantages and disadvantages. This review is particularly useful for highlighting the many ways the NTR/MTZ system has been applied to study regeneration of multiple cell types and to model different degenerative human diseases. The review concludes with a discussion on recent improvements made to the system and practical considerations and "best practices" for NTR-based experiments. This review could be a helpful resource, especially for researchers new to regeneration or cell ablation studies.

      Comments on revised version:

      I thank the reviewers for revising the manuscript to expand their discussion of using the prodrug/NTR system in other model organisms while also revising the abstract to make it clear this review will be zebrafish focused. With these revisions, this review provides an informative overview of how the prodrug/NTR system has not only been an important tool for regeneration studies and but also for elevating the zebrafish as a regeneration model. That said, including other model organisms could have been a nice addition to the last section on experimental considerations, especially in the context of discussing potential barriers to wider adoption of the NTR system. However, given that the vast majority of studies using the NTR system are in zebrafish, the current scope of this review is understandable.

    1. eLife Assessment

      This study provides valuable contributions to establish canonical Dhh signaling as a primary mediator in the differentiation of Leydig cells and their steroidogenic capacity. Together, the experimental design using their established stem Leydig cell line alongside relevant genetically mutated models, both derived using the relevant Nile tilapia animal system, provided largely convincing evidence to support their conclusions. The work will be of broad interest to developmental biologists interested in differentiation of steroidogenic or hormone producing cells.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      This manuscript by Zhao et. al investigates the canonical hedgehog pathway in testis development of Nile tilapia. They used complementary approaches with genetically modified tilapia and transfected TSL cells (a clonal stem Leydig cell line) previously derived from 3-mo old tilapia. The approach is innovative and provides a means to investigate DHH and each downstream component from the ptch receptors to the gli and sf1 transcription factors. They concluded that Dhh binds Ptch2 to stimulate Gli1 to promote an increase in Sf1 expression leading to the onset of 11-ketotesterone synthesis heralding the differentiation of Leydig cells in the developing male tilapia.'

      Strengths of the methods and results:

      - The use of Nile tilapia is important as it is an important aquaculture species, it shares the genetic pathway for sex determination of mammalian species, and molecular differentiation pathways are highly conserved<br /> - The approach is rigorous and incorporates a novel TSL, clonal stem Leydig cell model that they developed that is relatively faithful in following endogenous developmental steps and can produce the appropriate steroid.<br /> - Tilapia are relatively amenable to CRISPR/Cas9 targeting and, with their accelerated developmental time frame, provide an excellent model system to interrogate specific signaling pathways.<br /> - The stepwise analysis from dhh-gli-sf1 is thoughtful and well done.

      Achieved Aims: The authors set out to test the hypothesis that the canonical Dhh signaling pathway for Leydig cell differentiation and steroidogenic activity is mediated via ptch2 and gli1 regulation of sf1. The results are strong, there are additional steps needed to verify that redundancy/compensation is not contributing to the outcomes.

      This work is important in better understanding of nuanced commonalities and differences in developmental pathways across species. Specific to Leydig cell differentiation and steroidogenesis, their work with tilapia supports conservation of the canonical Dhh pathway; however, there appear to be some differences in downstream mediators compared to mouse. Specifically, they conclude that ptch2/gli1 stimulates sf1 and steroidogenesis in tilapia where gli1 is dispensable in mouse. Instead, Gli3 has recently been shown to play an important role to stimulate Sf1 and support the hedgehog pathway.

    3. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Weaknesses of the methods and results:

      - Line 162: need to establish and verify the PKH26-labeled TSL cells were unaffected by the dhh-/- environment. No data to support the claim that they were unaffected.

      We thank the reviewer for this important comment. In dhh<sup>-/-</sup> recipient testes, PKH26-labeled TSL cells were observed within the interstitial compartment (Fig. 3C3). Importantly, these PKH26-positive cells could be induced by SAG treatment to differentiate into Cyp11c1-positive steroidogenic cells (Fig. 3E3), indicating that they remained viable in the dhh<sup>-/-</sup> environment.

      We have revised the Results section (line 171–173) to “These results suggest that SLC differentiation is inhibited, whereas the survival and engraftment of PKH26-labeled TSL cells were not affected in dhh<sup>-/-</sup> XY tilapia testes.”

      - The rescued phenotype caused by the addition of ptch2-/- to the dhh-/- model is a compelling. To further define potential ptch1 contributions, it would be helpful to examine the expression level of ptch1 in the context of the ptch2-/- and ptch2-/-;dhh-/- mutant animals. Any compensatory increase in ptch1 in either case, without obvious phenotype changes, would support the dominant role for ptch2.

      We thank the reviewer for this valuable suggestion. We have now performed RT-qPCR analysis of ptch1 expression in XY testes from WT, ptch2<sup>-/-</sup> and dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> fish at 90 dah. As shown in Fig. S8, no significant differences in ptch1 mRNA levels were detected among these genotypes, indicating that loss of ptch2 does not induce compensatory upregulation of ptch1 at the transcriptional level under the conditions examined. We have revised the Discussion section (line 277–290) to “The specificity for Ptch2 in this context might stem from unique co-receptor interactions or expression patterns within the testicular niche. To preliminarily assess potential compensatory regulation, we examined ptch1 expression in XY testes from WT, ptch2<sup>-/-</sup> and dhh<sup>-/-</sup>;ptch2<sup>-/-</sup> fish at 90 dah. No significant differences in ptch1 mRNA levels were detected among these genotypes (Fig. S8), suggesting that loss of ptch2 does not trigger compensatory upregulation of ptch1 at the transcriptional level under the conditions examined. Nonetheless, global ptch2 mutation affects multiple tissues, whereas our mechanistic focus is on SLC differentiation within the testicular niche. Moreover, the early embryonic lethality of global ptch1 mutation in tilapia (Liu et al., 2024) precludes direct assessment of its role in postnatal testis development. Therefore, although our findings strongly support a predominant role for Ptch2 in mediating Dhh signaling in SLCs, definitive resolution of receptor specificity will require future Leydig cell-specific conditional knockout models.”

      - Activity of individual gli factors need additional reconciliation. The expression profiles for both alternative gli factors should be quantified in each knockout cell line to establish redundancy and/or compensation.

      We agree that quantifying the expression of alternative gli genes might be informative. In the present study, TSL-gli1<sup>-/-</sup> cells completely lose responsiveness to Dhh stimulation in the 8×GLI luciferase assay, whereas TSL-gli2<sup>-/-</sup> and TSL-gli3<sup>-/-</sup> cells retain normal pathway activation (Fig. 5B), which unambiguously suggest that Gli1 is the principal transcriptional effector in tilapia SLCs under our experimental conditions. Redundancy and/or compensation of alternative gli factors need further genetic dissection in the future study.

      - Figure 5E: An important control is missing that includes evaluation of HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1.

      We don’t think HEK293 cells transfected with pcDNA3.1-OnGli1 without the addition of pGL3-sf1 is an important control in our study. In the dual-luciferase assays, we think pcDNA3.1 + pGL3 (empty reporter) and pcDNA3.1 + pGL3-sf1 controls were sufficient.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Recommendations for improving the writing and presentation; minor corrections:

      - Include Park paper (Endocrinology 2007) somewhere near line 73. Need to acknowledge this paper as it is one of the first to connect Dhh to Sf1.

      We have now included the citation of Park et al. (Endocrinology 2007) in the Introduction (now line 81).

      - Include Kothandapani paper (PLoS Genetics 2020) somewhere near line 86. Need to acknowledge this paper as it is the only to reconcile the data showing no difference in Gli1 or Gli2 knockouts, but loss of Leydig cell function due to Gli3 activity.

      We have now included the citation of Kothandapani et al. (PLoS Genetics 2020) in the Introduction (now line 97).

      - Please include sequences of B1 and B2 in sf1 promoter, how conserved are they to the canonical Gli binding sequence?

      We have revised the Results section (line 216–218) to “Functional annotation of its promoter region identified two conserved Gli1-binding motifs, B1 (AACCACCCA) and B2 (GAGCCACCCA)”.

      - Figure 1 or results text: please clarify that the dhh-/- model used is the delta13bp mutation.

      We have clarified in the Results section (line 133) that the dhh<sup>-/-</sup> model corresponds to the 13-bp (CAGGGATGCGGAC) frameshift deletion.

      - Figure 5E legend: please clarify that HEK293 cells are used

      We have revised the Figure 5E legend to explicitly state that the dual-luciferase reporter assays were performed in HEK293 cells. Revised legend sentence (line 743-746): HEK293 cells were co-transfected with pRL-TK, pGL3, pcDNA3.1, pGL3-sf1, pcDNA3.1-On Gli1, and the indicated cold probe constructs, and luciferase activity was measured 48 hours post-transfection.

      - Figure S5E: * indicates the heteroduplex-it seems that there is a heteroduplex highlighted with the asterisk at ~600bp size; based on homozygous and mutant bands, it seems the asterisk should be highlighting the duplex near those sized bands. What are the bands up at ~600bp?

      We thank the reviewer for the careful observation. In Figure S5E, the bands observed at approximately ~600 bp represent heteroduplex products formed during the re-annealing of PCR amplicons derived from heterozygous individuals. During denaturation and re-annealing, WT and mutant strands can pair in different configurations, generating distinct heteroduplex conformations that migrate more slowly than homoduplex products in PAGE. As a result, two heteroduplex bands are visible at ~600 bp, reflecting alternative mismatched duplex structures. The homoduplex WT and mutant bands are indicated separately by arrows.

      - Figure S7F: dhh-/- data are missing

      We thank the reviewer for pointing out this omission. The missing dhh<sup>-/-</sup> dataset has now been added to Figure S7F, and the figure has been updated accordingly.

    1. eLife Assessment

      This important study provides a comprehensive multi-omics characterization of Leishmania donovani stage differentiation, offering insights into the molecular basis of parasite adaptation across host environments. The authors present convincing evidence that stage transitions are not driven by genomic variation but instead rely on coordinated post-transcriptional regulation, including mRNA turnover, translation, and protein degradation. Although experimental validation of these findings and conclusions remains to be completed, the integration of diverse, high-quality datasets establishes a robust resource that will be of broad utility to researchers investigating Leishmania biology and life-cycle progression.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      The authors describe co-regulated gene modules underlying stage differentiation in Leishmania donovani through a system-level analysis of multiple molecular layers. Using amastigotes isolated from infected hamster spleens and corresponding culture-derived promastigotes, they analyzed genomic variation, transcript abundance, protein levels, phosphorylation states, and metabolite profiles. By combining these, the study identified potential regulatory mechanisms associated with parasite differentiation and generated hypotheses regarding how gene expression is coordinated across different levels.

      Strengths:

      A major strength of the study is the breadth of the dataset generated. The integration provides an unusually comprehensive view of molecular changes associated with Leishmania differentiation in vitro. Such multi-layer datasets involving bona fide vertebrate host stages remain relatively rare in parasitology and will likely become a valuable resource for the molecular parasitology community. In addition, the use of amastigotes isolated from infected hamsters rather than relying on axenic models provided a biologically relevant framework for the analyses.

      The revised manuscript improved several aspects of the original. The RNA-seq analysis is described with a clearer pipeline, and several claims regarding causal regulatory feedback associations have been appropriately toned down. Among the observations reported, the association between parasite differentiation and proteasome-mediated protein degradation is particularly remarkable. The combination of quantitative proteomics with pharmacological inhibition of the proteasome with lactacystin provides support for a role for protein turnover in developmental transitions and paves the way for future mechanistic studies.

      Weaknesses:

      Most regulatory interpretations remain largely inferential or indirect. The integration identifies correlations between different levels, but direct functional validation is limited/absent. Many of the descriptions should not be interpreted as validated. As highlighted by the authors in this revised version, the mechanistic studies will be part of future work and are beyond the scope of the current work. Of note, the attempt to confirm lactacystin-induced inhibition of proteasomal activity via anti-polyUb immunoblotting did not demonstrate the expected outcome of increase in overall poly-ubiquitylation.Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

    3. Reviewer #2 (Public review):

      Pescher and colleagues present a revised manuscript detailing the multi-omic characterisation of Leishmania donovani amastigote to promastigote differentiation and integration of this data. The molecular pathways that regulate Leishmania life-stage transitions are still poorly understood, with many approaches exploring single proteins/RNAs etc in a reductionist manner. This paper takes a systems-scale approach and does a good job of integrating the disparate -omics datasets to generate hypotheses about the intersections of regulatory proteins that are associated with life-cycle progression. The differentiation step studied is from amastigote to promastigote using hamster-derived amastigotes which is a major strength. The use of hamsters permits the extraction of parasites that are host adapted and represent "normal", host-adapted Leishmania ploidy; the promastigote experiments are performed at a low passage number. Therefore, this is a strength or the work as it reduces the interference from the biological plasticity of Leishmania when it is cultured outside the host for prolonged periods. The multi-omics datasets presented are robust in their acquisition and analysis and will form an excellent resource for researchers studying the molecular events (particularly proteasomal protein degradation, and phosphorylation) during life-stage progression.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

    4. Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.

      In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.<br /> The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.<br /> The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Comments on revised version:

      The authors have appropriately addressed my comments and questions from the initial review process. My remaining concern relates to the lack of evidence to confirm proteasomal inhibition by lactacystin in both promastigotes and amastigotes. The immunoblotting experiment newly presented does not reveal a clear increase in the levels of poly-ubiquitylated proteins in treated parasites. In fact, poly-Ub levels were lower at both the 4h and 18h timepoints of treatment. If alternative antibodies or additional immunoblots are not available, the manuscript would benefit from an expanded discussion of this observation and potential explanations. In particular, the interpretation that lactacystin stabilizes ama- and pro-specific degradation would be greatly strengthened by such validation.

      Reviewer #2 (Public review):

      General comments on the revisions:

      My view is that the authors have made significant, satisfactory changes that address the comments and queries I made on the original manuscript (Review Commons).

      There are two areas where the authors had to make major changes/justifications where further comment is merited, these were:

      RNA-seq.

      The most significant issue was the originally underpowered RNA-seq which had only two replicates. This has been repeated with four replicates now. This has not led to changes in the interpretation of the data between the original study and this one. One comment that the authors make in the response to this was : "Given the robustness of the stage-specific transcriptome, and the legal constrains associated with the use of animals, we chose to limit the number of replicates to the necessary". Ensuring that animal experiments are properly powered and that maximum robustness of the data from the minimum sample size is an important part of experimental design for ethical use of animal models. Essentially the replication here could have been avoided if the original study had used 1 more animal. However, the new version of RNA-seq brings appropriate confidence to the interpretation of the data.

      Phosphoproteomics.

      The authors provide a robust justification of their strategy for the phosphoproteomics and highlight the inclusion criteria for phosphosites: "Phosphosites were only considered if detected with high confidence (identification FDR<1%) and high localisation confidence (localisation probability >0.75) in at least one replicate". The way missing values were dealt with is explained "For statistical analyses, missing values within a given condition were imputed with a well-established algorithm (MLE) only when at least one observed value was present in that condition." This fills in some of the gaps I was missing from the original manuscript, and I am satisfied that the data analysis is entirely appropriate for a discovery/system -based approach such as this one. The authors also edit the manuscript to reflect that "occupancy" or "stoichiometry" might not be the best description of what they were presenting and switched to the terminology of "normalised phosphorylation level" - I think this is an appropriate response.

      Overall, in the absence of follow up experiments on specific individual examples, some of the claims in the original submission were toned down and reflect a more neutral description of the data now. Significantly, the data still underpin a key role for regulation of the ribosome between the amastigote and promastigote stages (and during the differentiation process). The recursive and reciprocal links between the phosphorylation and ubiquitination systems are interesting and present many opportunities for future investigation.

      Reviewer #3 (Public review):

      Summary:

      The authors proposed to use 5-layer systems level analysis (genomics, transcriptomics, proteomics / protein degradation, metabolomics, phosphoproteomics) to uncover how post-transcriptional mechanisms regulate stage differentiation in Leishmania donovani.<br /> This enabled the identification of several potential regulatory networks, including the regulation of stage-specific gene clusters by RNA stabilisation or decay, proteasomal degradation and protein phosphorylation.

      In the new version of this manuscript, the authors have addressed all questions raised by the reviewers.

      Strengths:

      Although some observations in this study have already been described in the literature, the integrated analysis applied here provides a novel view on how different levels of post-transcriptional networks regulate Leishmania differentiation. This "5-layer system" represents the first analysis of this depth in kinetoplastid parasites.

      The revised version with an increased sample number for the RNA-seq now made the authors assumptions adequate to their obtained data.

      The use of a proteasomal inhibitor adds an interesting insight in how protein degradation is involved in the parasite differentiation, confirming previous observations in the literature, and help to explain the discrepancies between mRNA and protein expression in the different stages.

      Weaknesses:

      While this work provides an impressive and foundational dataset, it opens the door for future research to rigorously validate these initial findings and conclusions.

      Significance and Impact in the field.

      The different datasets generated in this study will be of great interest to the parasitology community, either to be used for hypothesis generation, to validate data from other sources, etc.

      The multi-layered analysis performed here identified a series of potential feedback loops and regulatory networks to be further explored in organisms that lack transcriptional control.

      According to the reviewers’ comments, we made the following minor changes:

      As suggested by reviewer 1, we have extended the discussion of the results related to the analysis of the ubiquitination pattern by Western blot analysis as follows: “Proteasome inhibition blocked amastigote-to-promastigote differentiation, without inducing rapid global accumulation of ubiquitinated proteins (Figure S7C, upper panel) consistent with a quiescent-like state and low basal ubiquitin–proteasome system activity in amastigotes. After 18 h, ubiquitination levels remained similar to untreated cells, indicating that protein turnover and ubiquitin accumulation are primarily driven by developmental remodeling rather than acute proteasome inhibition. In promastigotes, the lack of detectable change (Fig. S7C, lower panel) may also reflect high basal ubiquitination, engagement of compensatory pathways such as autophagy, and/or only partial proteasome inhibition.”

      Recommendations for the authors:

      Reviewer #3 (Recommendations for the authors):

      Minor comments:

      - Supplementary figure 3 is not referenced in the main text.

      - The authors removed the "infinite" sign from figures 3 and 4 to better present the data according to their chosen approach to missing values when LFQ=0. However, the sign is still present in the respective figure legends, please adjust.

      Supplementary Figure 3 (Figure S3) is now referenced in the main text as requested.

      The "infinite" sign has been removed from the legends of Figures 3 and 4 as requested.

    1. eLife Assessment

      This study provides valuable insights into mitochondrial cristae organization in Plasmodium falciparum, particularly in the context of its divergent MICOS composition. The authors present convincing evidence, supported by phenotypic and morphological analyses, that cristae junction maintenance can be uncoupled from de novo cristae formation, reinforcing an emerging model of mitochondrial inner membrane organization. Notably, the absence of Mic10 alongside an enlarged and divergent MICOS complex highlights an intriguing evolutionary adaptation, although further characterization of the complex would strengthen the study's overall significance.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      The manuscript by Tassan-Lugrezin et al. confirms the existence of the MICOS complex in the causative agent of malaria Plasmodium falciparum. Prior to this study, only one of the two core MICOS subunits, Mic60, was found by homology search to be encoded in the apicomplexan parasite's genome. This study demonstrates the absence of the other core subunit, Mic10. It also identifies another MICOS subunit, Mic19, which co-migrates with Mic60 in a very large molecular weight complex upon blue native polyacrylamide gel electrophoresis. The authors then demonstrate that expression of both Mic60 and Mic19 is considerably upregulated during the differentiation of P. falciparum from the pathogenic asexual blood stage (ABS) to gametocytes, which correlates with the activation of oxidative phosphorylation during this process. While gene deletion of Mic19, Mic60 and both simultaneously does not affect this transition, the crista are nevertheless deformed. More significantly, crista junctions are significantly reduced, indicating that MICOS serves the same function in apicomplexans as it does in opisthokonts: maintaining crista junctions. Furthermore, the genetic interaction of mic60 and mic19 observed by augmented crista deformation when both are deleted is further evidence of their biochemical interaction, further supporting their similar complexome profiles. This study represents an important contribution to our understanding of MICOS evolution. Furthermore, the study shows that proper cristae formation is not essential for Plasmodium life cycle progression under in vitro conditions. Moreover, mutant gametocytes are still able to mate in the mosquito vector, albeit with lower efficiency.

      Strengths:

      The study is a result of a lot of technically challenging work in the model Plamsodium. The technically difficult life cycle progression experiments are well performed as far as I can tell. The electron microscopy is very well done and rigorously analyzed to obtain information about crista parameters. In particular, the authors were able to quantify the occurrence and diameter of crista junctions, which is very challenging in small mitochondria with small cristae. Finally, the authors provide convincing support that Mic60 and the newly discovered Mic19 act to shape crista junctions and MICOS can apparently carry out this function without the core subunit Mic10.

      Weaknesses:

      In its current form, there are conceptual weaknesses. The authors focus on the development of crista from a highly likely acristate state. This is true. But there can be more insight by considering their result in light of discovering the first functioning MICOS complex without one of its two core proteins, Mic10. The surprisingly large size of is also not really considered by the authors. This brings me the second weakness in my opinion. While I think the study represents a lot of work utilizing appropriate and crucial experiments, it seems the Complexome data was not explored enough. This data revealed Mic19, but what other potential subunits are co-migrating with Mic60 and Mic19 that can explain the large size of Plasmodium MICOS? Also, what is the consequence of the loss of Mic60 and Mic19 on the mitoproteome? Perhaps other MICOS subunits can be identified by their depletion in the knockouts versus the parental cell line.

      Comments on latest version:

      I am reviewing this manuscript again after reviewing it for Reviewers Commons. I appreciate the author's responses to my comments. The new version is improved but, in my opinion, still needs more work.

      These revisions are changes to text, interpretations and obtaining more data from existing data or databases. I do still think one experimental control is necessary to substantiate the authors claim about membrane potential.

    3. Reviewer #2 (Public review):

      This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors add HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite - lack of signal concluded to be reflect very low expression levels. They also genetically delete both genes singly and in parallel and phenotype the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensable for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using mitotracker labelling, the authors observe differences in mitochondrial organisation in gametocytes compared to the transgenic lines. Further investigation at higher resolution using EM techniques, shows data supporting their hypothesis that PfMIC60 and PfMIC19 are important for organising the parasite mitochondrion.

      The manuscript is interesting and is an intriguing use of a well-studied organism of medical importance to answer fundamental biological questions. Given the essentiality of mitochondrial respiration for parasite survival in the mosquito, it is surprising that the single and double knock-out transgenics do not give a severe phenotype. However, the authors have been rigorous in characterizing the impact of genetic deletion of both genes throughout the parasite life cycle. Subtle differences in mitochondrial organisation were observed, consistent with their hypothesis that PfMIC60 and PfMIC19 play roles in mitochondrial organisation. Therefore, these data presented give new insights into an organelle that dramatically changes during parasite development and adds to our knowledge of mitochondrial biology in a highly unusual organism.

      Comments on revised version:

      I previously reviewed this manuscript for Review Commons. This version is greatly improved and the authors should be commended for addressing all comments raised.

    4. Reviewer #3 (Public review):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      Major comments (from the previous round of review):

      (1) The authors should improve to present their findings in the right context, in particular by:

      (i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      (ii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      (2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum, but this is not compared to the expected length or the size in S. cerevisiae.

      (3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Figs 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      (4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      (5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      (6) lines 380-385: "... thus suggesting that membrane invaginations still arise but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      (7) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Significance:

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism.

      The limitation of the study stems from what is already known about MICOS and its subunits in other organisms. MICOS subunit knockouts have been characterised in great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis.

      Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

      Comments on revised version:

      The authors have addressed my all of my previous comments in the updated manuscript version.

    5. Author response:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      This manuscript reports the identification of putative orthologues of mitochondrial contact site and cristae organizing system (MICOS) proteins in Plasmodium falciparum - an organism that unusually shows an acristate mitochondrion during the asexual part of its life cycle and then this develops cristae as it enters the sexual stage of its life cycle and beyond into the mosquito. The authors identify PfMIC60 and PfMIC19 as putative members and study these in detail. The authors at HA tags to both proteins and look for timing of expression during the parasite life cycle and attempt (unsuccessfully) to localise them within the parasite. They also genetically deleted both gene singly and in parallel and phenotyped the effect on parasite development. They show that both proteins are expressed in gametocytes and not asexuals, suggesting they are present at the same time as cristae development. They also show that the proteins are dispensible for the entire parasite life cycle investigated (asexuals through to sporozoites), however there is some reduction in mosquito transmission. Using EM techniques they show that the morphology of gametocyte mitochondria is abnormal in the knockout lines, although there is great variation.

      Major comments:

      The manuscript is interesting and is an intriguing use of a well studied organism of medical importance to answer fundamental biological questions. My main comments are that there should be greater detail in areas around methodology and statistical tests used. Also, the mosquito transmission assays (which are notoriously difficult to perform) show substantial variation between replicates and the statistical tests and data presentation are not clear enough to conclude the reduction in transmission that is claimed. Perhaps this could be improved with clearer text?

      We would like to thank the reviewer for taking the time to review our manuscript. We are happy to hear the reviewer thinks the manuscript is interesting and thank the reviewer for their constructive feedback.

      To clarify the statistical analyses used, we included a new supplementary dataset with all statistical analyses and p-values indicated per graph. Furthermore, figure legends now include the information on the exact statistical test used in each case.

      Regarding mosquito experiments, while we indeed reported a reduction in transmission and oocysts numbers, we are aware that this effect might be due to the high variability in mosquito feeding assays. To highlight this point, we deleted the sentence “with the transmission reduction of [numbers]….” and we included the sentence “The high variability encountered in the standard membrane feeding assays, though, partially obstructs a clear conclusion on the biological relevance of the observed reduction in oocyst numbers“

      More specific comments to address:

      Line 101/Fig1E (and figure legend) - What is this heatmap showing. It would be helpful to have a sentence or two linking it to a specific methodology. I could not find details in the M+M section and "specialized, high molecular mass gels" does not adequately explain what experiments were performed. The reference to Supplementary Information 1 also did not provide information.

      We added the information “high molecular mass gels with lower acrylamide percentage” to clarify methodology in the text. Furthermore, we extended the figure legend to include all relevant information. Further experimental details can be found in the study cited in this context, where the dataset originates from (Evers et al., 2021).

      Line 115 and Supplementary Figure 2C + D - The main text says that the transgenic parasites contained a mitochondrially localized mScarlet for visualization and localization, but in the supplementary figure 2 it shows mitotracker labelling rather than mScarlet. This is very confusing. The figure legend also mentions both mScarlet and MitoTracker. I assume that mScarlet was used to view in regular IFAs (Fig S2C) and the MitoTracker was used for the expansion microscopy (Fig S2D)?

      Please clarify.

      We thank the reviewer for pointing this out – this was indeed incorrectly annotated. We used the endogenous mito-mScarlet signal in IFA and mitoTracker in U-ExM. The figure annotation has now been corrected.

      Figure 2C - what is the statistical test being used (the methods say "Mean oocysts per midgut and statistical significance were calculated using a generalized linear mixed effect model with a random experiment effect under a negative binomial distribution." but what test is this?)?

      The statistic test is now included in the material and method section with the sentence “The fitted model was used to obtain estimated means and contrasts and were evaluated using Wald Statistics”. The test is now also mentioned in the figure legend.

      Also the choice of a log10 scale for oocyst intensity is an unusual choice - how are the mosquitoes with 0 oocysts being represented on this graph? It looks like they are being plotted at 10^-1 (which would be 0.1 oocysts in a mosquito which would be impossible).

      As the data spans three orders of magnitude with low values being biologically meaningful, we decided that a log scale would best facilitate readability of the graph. As the 0 values are also important to show, we went with a standard approach to handle 0s in log transformed data and substituted the 0s with a small value (0.001). We apologize for not mentioning this transformation in the manuscript. To make this transformation transparent, we added a break at the lower end of the log-scaled y-axis and relabelled the lowest tick as ‘0’. This ensures that mosquitoes with zero oocysts are shown along the x-axis without being assigned an artificial value on the log scale. We would furthermore like to highlight that for statistics we used the true value 0 and not 0.001.

      Figure 2D - it is great that the data from all feeding replicates has been shared, however it is difficult to conclude any meaningful impact in transmission with the knock-out lines when there is so much variation and so few mosquitoes dissected for some datapoints (10 mosquitoes are very small sample sizes). For example, Exp1 shows a clear decrease in mic19- transmission, but then Exp2 does not really show as great effect. Similarly, why does the double knock out have better transmission than the single knockouts? Sure there would be a greater effect?

      We agree with the reviewer and with the new sentence added, as per major point, we hope we clarified the concept. Note that original Figure 2D has been moved to the supplementary information, as per minor comment of another reviewer.

      Figure 3 legend - Please add which statistical test was used and the number of replicates.

      Done

      Figure 4 legend - Please add which statistical test was used and the number of replicates.

      Done. Regarding replicates, note that while we measured over 100 cristae from over 30 mitochondria, these all stem from the same parasite culture.

      Figure 5C - the 3D reconstructions are very nice, but what does the red and yellow coloring show?

      Indeed, the information was missing. We added it to the figure legend.

      Line 352 - "Still, it is striking that, despite the pronounced morphological phenotype, and the possibly high mitochondrial stress levels, the parasites appeared mostly unaffected in life cycle propagation, raising questions about the functional relevance of mitochondria at these stages."

      How do the authors reconcile this statement with the proven fact that mitochondria-targeted antimalarials (such as atovaquone) are very potent inhibitors of parasite mosquito transmission?

      Our original sentence was reductive. What we wanted to state was related to the functional relevance of crista architecture and overall mitochondrial morphology rather than the general functional relevance of the mitochondria. We changed the sentence accordingly.

      Furthermore, even though we do not discuss this in the article, we are aware of mitochondria targeting drugs that are known to block mosquito transmission. We want to point out that it is difficult to discern the disruption of ETC and therefore an impact on energy conversion with the impact on the essential pathway of pyrimidine synthesis, highly relevant in microgamete formation. Still, a recent paper from Sparkes et al. 2024 showed the essentiality of mitochondrial ATP synthesis during gametogenesis so it is very likely that the mitochondrial energy conversion is highly relevant for transmission to the mosquito.

      Reviewer #1 (Significance):

      This manuscript is a novel approach to studying mitochondrial biology and does open a lot of unanswered questions for further research directions. Currently there are limitations in the use of statistical tests and detail of methodology, but these could be easily be addressed with a bit more analysis/better explanation in the text.

      This manuscript could be of interest to readers with a general interest in mitochondrial cell biology and those within the specific field of Plasmodium research.

      My expertise is in Plasmodium cell biology.

      We thank the reviewer for the praise.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Major comments:

      (1) In my opinion, the authors tend to sensationalize or overinterpret their results. The title of the manuscript is very misleading. While MICOS is certainly important for crista formation, it is not the only factor, as ATP synthase dimer rows make a highly significant contribution to crista morphology. Thus, one can argue with equal validity that ATP synthase should be considered the 'architect', as it's the conformation of the dimers and rows modulate positive curvature. Secondly, while cristae are still formed upon mic60/mic19 gene knockout (KO), they are severely deformed, and likely dysfunctional (see below). Thus, I do not agree with the title that MICOS is dispensable for crista formation, because the authors results show that it clearly is essential. So, the title should be changed.

      We thank the reviewer for taking the time to review our manuscript.

      Based on the reviewers’ interpretation we conclude the title does not come across as intended. We have changed the title to: “The role of MICOS in organizing mitochondrial cristae in malaria parasites”

      The Discussion section starting from line 373 also suffers from overinterpretation as well as being repetitive and hard to understand. The authors infer that MICOS stability is compromised less in the single KOs (sKO) in compared to the mic60/mic19 double KO (dKO). MICOS stability was never directly addressed here and the composition of the MICOS complex is unaddressed, so it does not make sense to speculate by such tenuous connections. The data suggest to me that mic60 and mic19 are equally important for crista formation and crista junction (CJ) stabilization, and the dKO has a more severe phenotype than either KO, further demonstrating neither is epistatic.

      We do agree with the reviewer’s notion that we did not address complex stability, and our wording did not make this sufficiently clear. We shortened and rephrased the paragraph in question.

      The following paragraphs (line 387 to 422) continues with such unnecessary overinterpretation to the point that it is confusing and contradictory. Line 387 mentions an 'almost complete loss of CJs' and then line 411 mentions an increase in CJ diameter, both upon Mic60 ablation. I do not think this discussion brings any added value to the manuscript and should be shortened. Yes, maybe there are other putative MICOS subunits that may linger in the KOS that are further destabilized in the dKO, or maybe Mic60 remains in the mic19 KO (and vice versa) to somehow salvage more CJs, which is not possible in the dKO. It is impossible to say with confidence how ATP synthase behaves in the KOs with the current data.

      We shortened this paragraph.

      (2) While the authors went through impressive lengths to detect any effect on lifecycle progression, none was found except for a reduction in oocyte count. However, the authors did not address any direct effect on mitochondria, such as OXPHOS complex assembly, respiration, membrane potential. This seems like a missed opportunity, given the team's previous and very nice work mapping these complexes by complexome profiling. However, I think there are some experiments the authors can still do to address any mitochondrial defects using what they have and not resorting to complexome profiling (although this would be definitive if it is feasible):

      i) Quantification of MitoTracker Red staining in WT and KOs. The authors used this dye to visualize mitochondria to assay their gross morphology, but unfortunately not to assay membrane potential in the mutants. The authors can compare relative intensities of the different mitochondria types they categorized in Fig. 3A in 20-30 cells to determine if membrane potential is affected when the cristae are deformed in the mutants. One would predict they are affected.

      Interesting suggestion. As our staining and imaging conditions are suitable for such analysis (as demonstrated by Sarazin et al., 2025, https://www.biorxiv.org/content/10.1101/2025.11.27.690934v1), we performed the measurements on the same dataset which we collected for Figure 3. We did, however, not detect any difference in mitotracker intensity between the different lines. The result of this analysis is included in the new version of Supplementary figure S6.

      ii) Sporozoites are shown in Fig S5. The authors can use the same set up to track their motion, with the hypothesis that they will be slower in the mutants compared to WT due to less ATP. This assumes that sporozoite mitochondria are active as in gametocytes.

      While theoretically plausible and informative, we currently do not know the relevance of mitochondrial energy conversion for general sporozoite biology or specifically features of sporozoite movement. Given the required resources and time to set this experiment up and the uncertainty whether it is a relevant proxy for mitochondrial functioning, we argue it is out of scope for this manuscript.

      iii) Shotgun proteomics to compare protein levels in mutants compared to WT, with the hypothesis that OXPHOS complex subunits will be destabilized in the mutants with deformed cristae. This could be indirect evidence that OXPHOS assembly is affected, resulting in destabilized subunits that fail to incorporate into their respective complexes.

      While this experiment could potentially further our understanding of the interaction between MICOS and levels of OXPHOS complex subunits we argue that the indirect nature of the evidence does not justify the required investments.

      To expedite resubmission, the authors can restrict the cell lines to WT and the dKO, as the latter has a stronger phenotype that the individual KOs and conclusions from this cell line are valid for overall conclusions about Plasmodium MICOS.

      I will also conclude that complexome/shotgun proteomics may be a useful tool also for identifying other putative MICOS subunits by determining if proteins sharing the same complexome profile as PfMic60 and Mic19 are affected. This would address the overinterpretation problem of point 1.

      (3) I am aware of the authors previous work in which they were not able to detect cristae in ABS, and thus have concluded that these are truly acristate. This can very well be true, or there can be immature cristae forms that evaded detection at the resolution they used in their volumetric EM acquisitions. The mitochondria and gametocyte cristae are pretty small anyway, so it not unreasonable to assume that putative rudimentary cristae in ABS may be even smaller still. Minute levels of sampled complex III and IV plus complex V dimers in ABS that were detected previously by the authors by complexome profiling would argue for the presence of miniscule and/or very few cristae.

      I think that authors should hedge their claim that ABS is acristate by briefly stating that there still is a possibility that miniscule cristae may have been overlooked previously.

      We acknowledge that we cannot demonstrate the absolute absence of any membrane irregularities along the inner mitochondrial membrane. At the same time, if such structures were present, they would be extremely small and unlikely to contain the full set of proteins characteristic of mature cristae. For this reason, we consider it appropriate to classify ABS mitochondria as acristate. To reflect the reviewer’s point while maintaining clarity for readers, we have slightly adjusted our wording in the manuscript, changing ‘fully acristate’ to ‘acristate’.

      This brings me to the claim that Mic19 and Mic60 proteins are not expressed in ABS. This is based on the lack of signal from the epitope tag; a weak signal is detected in gametocytes. Thus, one can counter that Mic19 and Mic60 are also expressed, but below the expression limits of the assay, as the protein exhibits low expression levels when mitochondrial activity is upregulated.

      We agree with the reviewer that the absence of a detectable epitope-tag signal does not definitively exclude low-level expression, and we have therefore replaced the term ‘absent’ with ‘undetectable’ throughout the manuscript. In context with previous findings of low-level transcripts of the proteins in a study by Lopez-Berragan et al. and Otto et al., we also added the sentence “The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.” to the discussion. At the same time, we would like to clarify that transcript levels for both genes fall within the <25th percentile, suggesting that these low values likely represent background signal rather than biologically meaningful expression. This interpretation is further supported by proteomic datasets in PlasmoDB, which report PfMIC19 and PfMIC60 expression in gametocyte and mosquito stages, but not in asexual blood stages.”

      To address this point, the authors should determine of mature mic60 and mic19 mRNAs are detected in ABS in comparison to the dKO, which will lack either transcript. RT-qPCR using polyT primers can be employed to detect these transcripts. If the level of these mRNAs are equivalent to dKO in WT ABS, the authors can make a pretty strong case for the absence of cristae in ABS.

      We appreciate the reviewer’s suggestion. As noted in the Discussion, existing transcriptomic datasets already show detectable MIC19 and MIC60 mRNAs in ABS. For this reason, we expect RT-qPCR to reveal low (but not absent) levels of both transcripts, unlike the true loss expected to be observed in the dKO. Because such residual signals have been reported previously and their biological relevance remains uncertain, we do not believe transcript levels alone can serve as a definitive indicator of cristae absence in ABS.

      They should highlight the twin CX9C motifs that are a hallmark of Mic19 and other proteins that undergo oxidative folding via the MIA pathway. Interestingly, the Mia40 oxidoreductase that is central to MIA in yeast and animals, is absent in apicomplexans (DOI: 10.1080/19420889.2015.1094593).

      Searching for the CX9C motifs is a valuable suggestion. In response to the reviewer´s suggestion we analysed the conservation of the motif in PfMIC19 and included this in a new figure panel (Figure 1 F).

      Did the authors try to align Plasmodium Mic19 orthologs with conventional Mic19s? This may reveal some conserved residues within and outside of the CHCH domain.

      In response to this comment we made Figure 1 F, where we show conserved residues within the CHCH domains of a broad range of MIC19 annotated sequences across the opisthokonts, and show that the Cx9C motifs are conserved also in PfMIC19. Outside the CHCH domain, we did not find any meaningful conservation, as PfMIC19 heavily diverges from opisthokont MIC19.

      (5) Statistical significance. Sometimes my eyes see population differences that are considered insignificant by the statistical methods employed by the authors, eg Fig. 4E, mutants compared to WT, especially the dKO. Have the authors considered using other methods such as student t-test for pairwise comparisons?

      The graphs in figures 3, 4 and 5 got a makeover, such that they now are in linear scale and violin plots (also following a suggestion from further down in the reviewer’s comments). We believe that this improves interpretability. ANOVA was kept as statistical testing to assure the correction for multiple comparisons that cannot be performed with standard t-test. A full overview of statistics and exact pvalues can also be found in the newly added supplementary information 2.

      Minor comments:

      Line 33. Anaerobes (eg Giardia) have mitochondria that do produce ATP, unlike aerobic mitochondria

      We acknowledge that producing ATP via OXPHOS is not a characteristic of all mitochondria-like organelles (e.g. mitosomes), which is why these are typically classified separately from canonical mitochondria. When not considering mitochondria-like organelles, energy conversion is the function that the mitochondrion is most well-known for and the one associated with cristae.

      Line 56: Unclear what authors mean by "canonical model of mitochondria"

      To clarify we changed this to “yeast or human” model of mitochondria.

      Lines 75-76: This applies to Mic10 only

      We removed the “high degree of conservation in other cristate eukaryotes” statement.

      Line 80: Cite DOI: 10.1016/j.cub.2020.02.053

      Done

      Fig 2D: I find this table difficult to read. If authors keep table format, at least get rid of 'mean' column' as this data is better depicted in 2C. I suggest depicted this data either like in 3B depicting portion of infected vs unaffected flies in all experiments, then move modified Table to supplement. Important to point out experiment 5 appears to be an outlier with reduced infectivity across all cell lines, including WT.

      To clarify: the mean reported in the table indicates the mean per replicate while the mean reported in figure 2C is the overall mean for a given genotype that corrects for variability within experiments. We agree that moving the table to the supplementary data is a good idea. We decided to not include a graph for infected and non-infected mosquitoes as this information would be partially misleading, highlighting a phenotype we argue to be influenced by the strong variability.

      Fig. 3C-G: I feel like these data repeatedly lead to same conclusions. These are all different ways of showing what is depicted in Fig 2B: mitochondria gross morphology is affected upon ablation of MICOS. I suggest that these graphs be moved to supplement and replaced by the beautiful images.

      Thank you for the nice comment on our images. We have now moved part of the graphs to supplementary figure 6 and only kept the Relative Frequency, Sphericity and total mitochondria volume per cell in the main figure.

      Line 180: Be more specific with which tubulin isoform is used as a male marker and state why this marker was used in supplemental Fig S6.

      We have now specified the exact tubulin isoform used as the male gametocyte marker, both in the main text and in Supplementary Fig. S6. This is a commercial antibody previously known to work as an effective male marker, which is why we selected it for this experiment. This is now clearly stated in the manuscript.

      Line 196 and Fig 3C: the word 'intensities' in this context is very ambiguous. Please choose a different term (puncta, elements, parts?). This is related to major point 2i above.

      To clarify the biological effect that we can conclude form the measurement, we added an explanation about it in the respective section of the results, and we decided to replace the raw results of the plug-in readout with the deduced relative dispersion.

      Line 222: Report male/female crista measurements

      We added Supplementary information 2, which contains exact statistical test and outcomes on all presented quantifications as well as a per-sex statistical analysis of the data from figure 4. Correspondingly, we extended supplementary information 2 by a per-sex colour code for the thin section TEM data.

      Fig. 4B-E: depict data as violin plots or scatter plots like Fig. 2C to get a better grasp of how the crista coverage is distributed. It seems like the data spread is wider in the double KO. This would also solve the problem with the standard deviation extending beyond 0%.

      We changed this accordingly.

      Lines 331-333: Please clarify that this applies for some, but not all MICOS subunits. Please also see major point 1 above. Also, the authors should point out that despite their structural divergence, trypanosomal cryptic mitofilins Mic34 and Mic40 are essential for parasite growth, in contrast to their findings with PfMic60 (DOI: https://doi.org/10.1101/2025.01.31.635831).

      This has been changed accordingly.

      Line 320: incorrect citation. Related to point 1above.

      Correct citation is now included in the text.

      Lines 333-335. This is related to the above. Again, some subunits appear to affect cell growth under lab conditions, and some do not. This and the previous sentence should be rewritten to reflect this.

      This has been changed accordingly.

      Line 343-345: The sentence and citation 45 are strange. Regarding the former, it is about CHCHD10, whose status as a bona fide MICOS subunit is very tenuous, so I would omit this. About the phenomenon observed, I think it makes more sense to write that Mic60 ablation results in partially fragmented mitochondria in yeast (Rabl et al., 2009 J Cell Biol. 185: 1047-63). A fragmented mitochondria is often a physiological response to stress. I would just rewrite as not to imply that mitochondrial fission (or fusion) is impaired in these KOs, or at least this could be one of several possibilities.

      The sentence has been substituted following the indication of the reviewer. Though we still include the data of the human cells as this has also been shown in Stephens et al. 2020.

      Line 373: 'This indicates' is too strong. I would say 'may suggest' as you have no proof that any of the KOs disrupts MICOS. This hypothesis can be tested by other means, but not by penetrance of a phenotype.

      Done

      Line 376-377; 'deplete functionality' does not make sense, especially in the context of talking about MICOS subunit stability. In my opinion, this paragraph overinterprets the KO effects on MICOS stability. None of the experiments address this phenomenon, and thus the authors should not try to interpret their results in this context. See major point 1.

      We removed the sentence. Also, the entire paragraph has been shortened, restructured and wording was changed to address major point 1.

      Other suggestions for added value

      (1) Does Plasmodium Sam50 co-fractionate with Mic60 and Mic19 in BN PAGE (Fig. 1E)

      While we did identify SAMM50 in our BN PAGE, the protein does not co-migrate with the MICOS components but instead comigrates with other components of a putative sorting and assembly machinery (SAM) complex. As SAMM50, the SAM complex and the overarching putative mitochondrial membrane space bridging (MIB) complex are not mentioned in the manuscript, we decided to not include the information in Author response image 1.

      Author response image 1.

      Reviewer #2 (Significance):

      The manuscript by Tassan-Lugrezin is predicated on the idea that Plasmodium represents the only system in which de novo crista formation can be studied. They leverage this system to ask the question whether MICOS is essential for this process. They conclude based on their data that the answer is no, which the authors consider unprecedented. But even if their claim is true that ABS is acristate, this supposed advantage does not really bring any meaningful insight into how MICOS works in Plasmodium.

      First the positives of this manuscript. As has been the case with this research team, the manuscript is very sophisticated in the experimental approaches that are made. The highlights are the beautiful and often conclusive microscopy performed by the authors. Only the localization of Mic60 and Mic19 was inconclusive due to their very low expression unfortunately.

      The examination of the MICOS mutants during in vitro life cycle of Plasmodium falciparum is extremely impressive and yields convincing results. Mitochondrial deformation is tolerated by life cycle stage differentiation, with a modest but significant reduction of oocyte production, being observed.

      However, despite the herculean efforts of the authors, the manuscript as it currently stands represents only a minor advance in our understanding of the evolution of MICOS, which from the title and focus of the manuscript, is the main goal of the authors.

      In its current form, the manuscript reports some potentially important findings:

      (1) Mic60 is verified to play a role in crista formation, as is predicted by its orthology to other characterized Mic60 orthologs.

      (2) The discovery of a novel Mic19 analog (since the authors maintain there is no significant sequence homology), which exhibits a similar (or the same?) complexome profile with Mic60. This protein was upregulated in gametocytes like Mic60 and phenocopies Mic60 KO.

      (3) Both of these MICOS subunits are essential (not dispensable) for proper crista formation

      (4) Surprisingly, neither MICOS subunit is essential for in vitro growth or differentiation from ABS to sexual stages, and from the latter to sporozoites. This says more about the biology of plasmodium itself than anything about the essentiality of Mic60, i.e. plasmodium life cycle progression tolerates defects to mitochondrial morphology. But yes, I agree with the authors that Mic60's apparent insignificance for cell growth in examined conditions does differ with its essentiality in other eukaryotes. But fitness costs were not assayed (e.g. by competition between mutants and WT in infection of mosquitoes)

      (5) Decreased fitness of the mutants is implied by a reduction of oocyte formation.

      While interesting in their own way, collectively they do not represent a major advance in our understanding of MICOS evolution. Furthermore, the findings bifurcate into categories informing MICOS or Plasmodium biology. Both aspects are somewhat underdeveloped in their current form.

      This is unfortunate because there seem to be many missed opportunities in the manuscript that could, with additional experiments, lead to a manuscript with much wider impact. For me, what is remarkable about Plasmodium MICOS that sets it apart from other iterations is the apparent absence of the Mic10 subunit. Purification of plasmodium MICOS via the epitope tagged Mic60 and Mic19 could have verified that MICOS is assembled without this core subunit. Perhaps Mic60 and Mic19 are the vestiges of the complex, and thus operate alone in shaping cristae. Such a reduction may also suggest the declining importance of mitochondria in plasmodium.

      Another missed opportunity was to assay the impact of MICOS-depletion of OXPHOS in plasmodium.

      This is a salient issue as maybe crista morphology is decoupled from OXPHOS capacity in Plasmodium, which links to the apparent tolerance of mitochondrial morphology in cell growth and differentiation. I suggested in section A experiments to address this deficit.

      Finally, the authors could assay fitness costs of MICOS-ablation and associated phenotypes by assaying whether mosquito infectivity is reduced in the mutants when they are directly competing with WT plasmodium. Like the authors, I am also surprised that MICOS mutants can pass population bottlenecks represented by differentiation events. Perhaps the apparent robustness of differentiation may contribute plasmodium's remarkable ability to adapt.

      I realize that the authors put a lot of efforts into their study and again, I am very impressed by the sophistication of the methods employed. Nevertheless, I think there is still better ways to increase the impact of the study aside from overinterpreting the conclusions from the data. But this would require more experiments along the lines I suggest in Section A and here.

      We thank the reviewer for their extensive analysis of the significance of our findings, including the compliments on our microscopy images and the sophisticated experimental approaches. We hope we have convincingly argued why we could or could not include some of the additional analyses suggested by the reviewer in section 1 above.

      With regard to the significance statement, we want to point out that our finding that PfMICOS is not needed for initial formation of cristae (as opposed to organization thereof), is a confirmation of something that has been assumed by the field, without being the actual focus of studies. We argue that the distinction between formation and organization of cristae is important and deserves some attention within the manuscript. The result of MICOS not being involved in the initial formation of cristae, we argue to be relevant in Plasmodium biology and beyond. As for the insights into how MICOS works in Plasmodium we have confirmed that the previously annotated PfMIC60 is indeed involved in the organization of cristae. Furthermore, we have identified and characterized PfMIC19. These findings, we argue, are indeed meaningful insights into PfMICOS.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      MICOS is a conserved mitochondrial protein complex responsible for organising the mitochondrial inner membrane and the maintenance of cristae junctions. This study sheds first light on the role of two MICOS subunits (Mic60 and the newly annotated Mic19) in the malaria parasite Plasmodium falciparum, which forms cristae de novo during sexual development, as demonstrated by EM of thin section and electron tomography. By generating knockout lines (including a double knockout), the authors demonstrate that knockout of both MICOS subunits leads to defects in cristae morphology and a partial loss of cristae junctions. With a formidable set of parasitological assays, the authors show that despite the metabolically important role of mitochondria for gametocytes, the knockout lines can progress through the life stages and form sporozoites, albeit with diminished infection efficiency.

      We thank the reviewer for their time and compliment.

      Major comments:

      (1) The authors should improve to present their findings in the right context, in particular by:

      i) giving a clearer description in the introduction of what is already known about the role of MICOS. This starts in the introduction, where one main finding is missing: loss of MICOS leads to loss of cristae junctions and the detachment of cristae membranes, which are nevertheless formed, but become membrane vesicles. This needs to be clearly stated in the introduction to allow the reader to understand the consistency of the authors' findings in P. falciparum with previous reports in the literature.

      We extended the introduction to include this information.

      iii) at the end to the introduction, the motivating hypothesis is formulated ad hoc "conclusive evidence about its involvement in the initial formation of cristae is still lacking" (line 83). If there is evidence in the literature that MICOS is strictly required for cristae formation in any organism, then this should be explained, because the bona fide role of MICOS is maintenance of cristae junctions (the hypothesis is still plausible and its testing important).

      To clarify we rephrased the sentence to: “Although MICOS has been described as an organizer of crista junctions, its role during the initial formation of nascent cristae has not been investigated.”

      (2) Line 96-97: "Interestingly, PfMIC60 is much larger than the human MICOS counterpart, with a large, poorly predicted N-terminal extension." This statement is lacking a reference and presumably refers to annotated ORFs. The authors should clarify if the true N-terminus is definitely known - a 120kDa size is shown for the P. falciparum but this is not compared to the expected length or the size in S. cerevisiae.

      To solve the reference issue, we added the uniprot IDs we compared to see that the annotated ORF is bigger in Plasmodium. We also changed the comparison to yeast instead of human, because we realized it is confusing to compare to yeast all throughout the figure, but then talk about human in this specific sentence.

      Regarding whether the true N-terminus is known. Short answer: No, not exactly.

      However, we do know that the Pf version is about double the size of the yeast protein.

      As the reviewer correctly states, we show the size of 120kDa for the tagged protein in Figure 1G. Considering that we tagged the protein C-terminally, and observed a 120kDa product on western blot, it is safe to conclude that the true N-terminus does not deviate massively from the annotated ORF, and hence, that there is a considerable extension of the protein beyond a 60kDa protein. We do not directly compare to yeast MIC60 on our western blots, however, that comparison can be drawn from literature: Tarasenko et al., 2017 showed that purified MIC60 running at ~60kDa on SDS-PAGE actively bends membranes, suggesting that in its active form, the monomer of yeast MIC60 is indeed 60kDa in size.

      To clarify, we now emphasize that we ran the Alphafold prediction on the annotated open reading frame (annotated and sequenced by Bohme et al. and Chapell et al. now cited in the manuscript), and revised the wording to make clear what we are comparing in which sentence.

      (3) lines 244-245: "Furthermore, our data indicates the effect size increases with simultaneous ablation of both proteins?". The authors should explain which data they are referring to, as some of the data in Fig 3 and 4 look similar and all significance tests relate to the wild type, not between the different mutants, so it is not clear if any overserved differences are significant. The authors repeat this claim in the discussion in lines 368-369 without referring to a specific significance test. This needs to be clarified.

      As a reply to this and other comments from the reviewers we added the multiple testing within all samples. In addition, to clarify statistics used we included a supplementary dataset with all p-values and statistical tests used.

      (4) lines 304-306: "Though well established as the cristae organizing system, the role of MICOS in initial formation of cristae remains hidden in model organisms that constitutively display cristae.". This sentence is misleading since even in organisms that display numerous cristae throughout their life cycle, new cristae are being formed as the cells proliferate. Thus, failure to produce cristae in MICOS knockout lines would have been observable but has apparently not been reported in the literature. Thus, the concerted process in P. falciparum makes it a great model organism, but not fundamentally different to what has been studied before in other organisms.

      We deleted this statement.

      (5) lines 373-378. "where ablation of just MIC60 is sufficient to deplete functionality of the entire MICOS (11, 15),". The authors' claim appears to be contrary to what is actually stated in ref 15, which they cite:

      "MICOS subunits have non-redundant functions as the absence of both MICOS subcomplexes results in more severe morphological and respiratory growth defects than deletion of single MICOS subunits or subcomplexes."

      This seems in line with what the authors show, rather than "different".

      This sentence has been removed.

      (6) lines 380-385: "... thus suggesting that membrane invaginations still arise, but are not properly arranged in these knockout lines. This suggests that MICOS either isn't fully depleted,...". These conclusions are incompatible with findings from ref. 15, which the authors cite. In that study, the authors generated a ∆MICOS line which still forms membrane invaginations, showing that MICOS is not required at all for this process in yeast. Hence the authors' implication that MICOS needs to be fully depleted before membrane invaginations cease to occur is not supported by the literature.

      This sentence has been deleted in the revised version of the manuscript.

      Minor comments:

      (1) The authors should consider if the first part of their title could be seen as misleading: It suggests that MICOS is "the architect" in cristae formation, but this is not consistent with the literature nor their own findings.

      Title is changed accordingly

      - Line 43, of the three seminal papers describing the discovery of MICOS in 2011, the authors only cite two (refs 6 and 7), but miss the third paper, Hoppins et al, PMID: 21987634, which should probably be corrected.

      Done, the paper is now cited

      - Page 2, line 58: for a more complete picture the authors should also cite the work of others here which shows that although at very low levels, e.g. complex III (a drug target) and ATP synthase do assemble (Nina et al, 2011, JBC).

      Done

      - Page 3, line 80: "Irrespective of the shape of an organism's cristae, the crista junctions have been described as tubular channels that connect the cristae membrane to the inner boundary membrane (22, 24)." This omits the slit-shaped cristae junctions found in yeast (Davies et al, 2011, PNAS), which the authors should include.

      The paper and concept have been added to the manuscript, though the sentence has been moved up in the introduction, when crista junctions are first introduced.

      - Line 97: "poorly predicted N-terminal extension", as there is no experimental structure, we don't know if the prediction is poor. Presumably the authors mean either poorly ordered or the absence of secondary structure elements, or the poor confidence score for that region in the prediction? This should be clarified or corrected.

      We were referring to the poor confidence score. To address this comment as well as major point 2, we rewrote the respective paragraph. It now clearly states that confidence of the prediction is low, and we mention the tool that was used to identify conserved domains (Topology-based Evolutionary Domains).

      - Line 98: "an antiparallel array of ten β-sheets". They are actually two parallel beta-sheets stacked together. The authors could find out the name of this fold, but the confidence of the prediction is marked a low/very low. So, its existence is unknown, not just its "function".

      We adapted the domain description to “a stack of two parallel beta-sheets" and replaced the statement on unknown function by the statement “Because this domain is predicted solely from computational analysis, both its actual existence in the native protein and its biological function remain unknown.”

      - Fig 1B: The authors show two alphafold predictions of S. cerevisiae and P. falciparum Mic60 structures. There is however an experimental Mic60/19 (fragment) structure from the former organism (PMID: 36044574), which should be included if possible.

      We appreciate the reviewer’s suggestion and note that the available structural data indeed provides valuable insight into how MIC60 and MIC19 interact. However, these structures represent fusion constructs of limited protein fragments and therefore capture only a small portion of each protein, specifically the interaction interface. Because our aim in Fig. 1B is to compare the overall domain architecture of the full-length proteins, we believe that including fragment-based structures would be less informative in this context.

      - Line: 318-321: "The same trend was observed for PfMIC19 and PfMIC60. Although transcriptomic data suggested that low-level transcripts of PfMIC19 and PfMIC60 are present in ABS (38), we did not detect either of the proteins in ABS by western blot analysis. While this statement is true, the authors should comment on the sensitivity of the respective methods - how well was the antibody working in their hands and how do they interpret the absence of a WB band compared to transcriptomics data?

      The HA antibody used in our experiments is a standard commercial reagent that performs reliably in both WB and IFA, although it shows a low background signal in gametocytes. We agree that the sensitivity of the method and the interpretation of weak or absent bands should be addressed explicitly. Transcript levels for both PfMIC19 and PfMIC60 in asexual blood stages fall within the <25 percentile, suggesting that these signals likely represent background. Nevertheless, we acknowledge that low-level protein expression below the detection limit of western blot analysis cannot be excluded. To reflect these considerations, we added the sentence: ‘The apparent absence could indicate that transcripts are not translated in ABS or that the proteins’ expression was below detection limits of western blot analysis.

      - Lines 322-323: would the authors not typically have expected an IFA signal given the strength of the band in Western blot? If possible, the authors should comment if the negative fluorescence outcome can indeed be explained with the low abundance or if technical challenges are an equally good explanation.

      Considering the nature of the investigated proteins (embedded in the IMM and spread throughout the mitochondria) difficulties in achieving a clear signal in IFA or U-ExM are not very surprizing. While epitopes may remain buried in IFA, U-ExM usually increases accessibility for the antibodies. However, U-ExM comes at the cost of being prone to dotty background signals, therefore potentially hiding low abundance, naturally dotty signals such as the signal of MICOS proteins that localize to distinct foci (at the CJ) along the mitochondrion. Current literature suggests that, in both human and yeast, STED is the preferred method for accurate spatial resolution of MICOS proteins (https://www.ncbi.nlm.nih.gov/pubmed/32567732,https://www.ncbi.nlm.nih.gov/pubmed/3206734 4). Unfortunately, we do not have experience with, nor access to, this particular technique/method.

      - Lines 357-365: the authors describe limitations of the applied methods adequately. Perhaps it would be helpful to make a similar statement about the analysis of 3D objects like mitochondria and cristae from 2D sections. E.g. the apparent cristae length depends on whether cristae are straight (e.g. coiled structures do not display long cross sections despite their true length in 3D).

      The limitations of other methods are described in the respective results section.

      We added a clarifying sentence in the results section of Figure 4:

      “Note that such measurements do not indicate the true total length or width of cristae, as the data is two-dimensional. The recorded values are to be considered indicative of possible trends, rather than absolute dimensions of cristae.“

      This statement refers to the length/width measurements of cristae.

      In the context of Figure 4D we mention the following (see preprint lines 229 – 230): “We expect this effect to translate into the third dimension and thus conclude that the mean crista volume increases with the loss of either PfMIC19, PfMIC60, or both.”

      For Figure 5, we included a clarifying statement in the results section of the preprint (lines 269 – 273): “Note that these mitochondrial volumes are not full mitochondria, but large segments thereof. As a result of the incompleteness of the mitochondria within the section, and the tomography specific artefact of the missing wedge, we were unable to confirm whether cristae were in fact fully detached from the boundary membrane, or just too long to fit within the observable z-range.”

      - Line 404: perhaps undetected or similar would be a better description than "hidden"?

      The sentence does not exist in the revised manuscript.

      Reviewer #3 (Significance):

      The main strength of the study is that it provides the first characterisation of the MICOS complex in P. falciparum, a human parasite in which the mitochondrion has been shown to be a drug target. Mic60 and the newly annotated Mic19 are confirmed to be essential for proper cristae formation and morphology, as well as overall mitochondrial morphology. Furthermore, the mutant lines are characterised for their ability to complete the parasite life cycle and defects in infection effectivity are observed. This work is an important first step for deciphering the role of MICOS in the malaria parasite and the composition and function of this complex in this organism. The limitation of the study stems from what is already known about MICOS and its subunits in great detail in yeast and humans with similar findings regarding loss of cristae and cristae defects. The findings of this study do not provide dramatic new insight on MICOS function or go substantially beyond the vast existing literature in terms of the extent of the study, which focuses on parasitological assays and morphological analysis. Exploring the role of MICOS in an early-divergent organism and human parasite is however important given the divergence found in mitochondrial biology and P. falciparum is a uniquely suited model system. One aspect that would increase the impact of the paper would be if the authors could mechanistically link the observed morphological defects to the decreased infection efficiency, e.g. by probing effects on mitochondrial function. This will likely be challenging as the morphological defects are diverse and the fitness defects appear moderate/mild.

      As suggested by Reviewer 2, we examined mitochondrial membrane potential in gametocytes using MitoTracker staining and did not observe any obvious differences associated with the morphological defects. At present, additional assays to probe mitochondrial function in P. falciparum gametocytes are not sufficiently established, and developing and validating such methods would require substantial work before they could be applied to our mutant lines. For these reasons, a more detailed mechanistic link between the observed morphological changes and the reduced infection efficiency is currently beyond reach.

      The advance presented in this study is to pioneer the study of MICOS in P. falciparum, thus widening our understanding of the role of this complex to different model organism. This study will likely be mainly of interest for specialised audiences such as basic research parasitologists and mitochondrial biologists. My own field of expertise is mitochondrial biology and structural biology.

    1. eLife Assessment

      This valuable study used genetic and pharmacological manipulations of insulin/IGF signaling to address the role of insulin/IGF axis in the function of renal glomerular podocyte. Solid data are presented to demonstrate that co-inhibition of insulin/IGF signaling in podocytes led to aberrant splicing of mRNAs, which could contribute to the loss of podocytes in vitro and in vivo in mice. In light of the fact that IR/IGF-1R signaling are critically required for normal development and growth in multiple cells and organs, the lack of the assessment of developmental phenotype of podocytes in the mouse model limits the interpretation of the data.

      [Editors' note: this paper was reviewed by Review Commons.]

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      Significance:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.

      Latest comments:

      The new reviewer raised two major points, whether the KO effect on splicing is specific to IGF1 and whether the interpretation could be developmental rather than due to splicing. The reviewer raises some important issues but the evidence to suggest that this is specific is data in the literature that IR/IGF signaling is already known to regulate splicing and that splicing defects were not detected in other models that they have analyzed. I agree with the reviewer (and authors) that the incomplete floxing of the genes is a major complication. The point that there could be a developmental defect with mice being born with fewer podocytes and perhaps the authors should caveat this point. The fact that they mice are born with normal function, that renal function can be maintained with up to 80% loss of podocytes suggest that they are likely born with a good number of podocytes and the dysfunction that occurs at 6 months is due to a process, induced by the loss of IR/IGF signaling that is detrimental to the podocyte.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.

      Methods are generally well described.

      Comments on previous version:

      Coward and colleagues have done an excellent job of responding to all the reviewer comments.

    4. Reviewer #4 (Public review):

      Summary and background:

      This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.

      Comments on revised version:

      The second sentence of the Summary reads "This study sought to elucidate the compound role of the insulin/IGF1 axis in podocytes using transgenic mice and cell culture models deficient in both receptors." The study design and rationale for the proteosome analysis described is predicated on the finding that podocyte-specific knockdown of the IR/IGF-1R in mice is associated with development of proteinuria and reduced eGFR by 20months of life. Since the IR/IGF-1R are critically required for normal development and growth of all cells and organs, the obvious explanation for the observation would be that the model system results in defective podocyte development and deployment (caused by reduced IR/IGF-1) that, in turn, causes subsequent development of proteinuria and glomerulosclerosis (that may be much less dependent on a normal level of IR/IGF-1R expression). Thus, the experimental design does not allow a distinction between podocyte development and steady state function which are different biologic processes. The data provided does not examine podocyte status immediately after birth to confirm that podocyte number and size and structure is normal in mice that subsequently develop proteinuria and glomerulosclerosis. The response to the reviewer suggests that since this would require additional mice it has not been undertaken in order to reduce animal usage. This is not a valid argument, particularly when the investigators have not even used state-of-the-art methods to measure podocyte number, size and density in adult mice, key parameters that would be required to interpret their data. Counting podocyte nuclear number in glomerular cross-sections is simply an inadequate method, even if it is used and reported in other journals, and particularly where the examples given to justify its use can hardly be viewed as representing first rate science.

      If the absence of studies that would answer the above questions, the investigators should add a sentence to the Discussion dealing with study limitations as follows. "The study design does not allow us to determine whether the primary effect of reduced IR/IGF-1R expression on the phenotype is during in utero and post-natal podocyte development and deployment, during periods of rapid growth when IGF-1 levels are highest, in steady state adult podocytes, or under all of the above conditions".

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the role of the insulin receptor and the insulin growth factor receptor was investigated in podocytes. Mice, where both receptors were deleted, developed glomerular dysfunction and developed proteinuria and glomerulrosclerosis over several months. Because of concerns about incomplete KO, the authors generated and studied podocyte cell lines where both receptors were deleted. Loss of both receptors was highly deleterious with greater than 50% cell death. To elucidate the mechanism of cell death, the authors performed global proteomics and found that spliceosome proteins were downregulated. They confirmed this directly by using long-read sequencing. These results suggest a novel role for insulin and IGF1R signaling in RNA splicing in podocytes.

      This is primarily a descriptive study and no technical concerns are raised. The mechanism of how insulin and IGF1 signaling regulates splicing is not directly addressed but implicates potentially the phosphorylation downstream of these receptors. In the revised manuscript, it is shown that the mouse KO is incomplete potentially explaining the slow onset of renal insufficiency. Direct measurement of GFR and serial serum creatinines might also enhance our understanding of progression of disease, proteinuria is a strong sign of renal injury. An attempt to rescue the phenotype by overexpression of SF3B4 would also be useful but may be masked by defects in other spliceosome genes. As insulin and IGF are regulators of metabolism, some assessment of metabolic parameters would be an optional add-on.

      Significance:

      With the GLP1 agonists providing renal protection, there is great interest in understanding the role of insulin and other incretins in kidney cell biology. It is already known that Insulin and IGFR signaling play important roles in other cells of the kidney. So, there is great interest in understanding these pathways in podocytes. The major advance is that these two pathways appear to have a role in RNA metabolism.

      Comments on revised version:

      I'm satisfied with the revised manuscript and the responses to my previous concerns.

      Thank you.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, submitted to Review Commons (journal agnostic), Coward and colleagues report on the role of insulin/IGF axis in podocyte gene transcription. They knocked out both the insulin and IGFR1 mice. Dual KO mice manifested a severe phenotype, with albuminuria, glomerulosclerosis, renal failure and death at 4-24 weeks.

      Long read RNA sequencing was used to assess splicing events. Podocyte transcripts manifesting intron retention were identified. Dual knock-out podocytes manifested more transcripts with intron retention (18%) compared wild-type controls (18%), with an overlap between experiments of ~30%.

      Transcript productivity was also assessed using FLAIR-mark-intron-retention software. Intron retention w seen in 18% of ciDKO podocyte transcripts compared to 14% of wild-type podocyte transcripts (P=0.004), with an overlap between experiments of ~30% (indicating the variability of results with this method). Interestingly, ciDKO podocytes showed downregulation of proteins involved in spliceosome function and RNA processing, as suggested by LC/MS and confirmed by Western blot.

      Pladienolide (a spliceosome inhibitor) was cytotoxic to HeLa cells and to mouse podocytes but no toxicity was seen in murine glomerular endothelial cells.

      The manuscript is generally clear and well-written. Mouse work was approved in advance. The four figures are generally well-designed, bars/superimposed dot-plots.

      Methods are generally well described.

      Comments on revised version:

      Coward and colleagues have done an excellent job of responding to all the reviewer comments.

      Thank you.

      Reviewer #4 (Public review):

      Summary and background:

      This report entitled "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte" from Hurcombe et al is based on a mouse double knockdown of the IR and IGF1R and a parallel cultured mouse podocyte model. Insulin/IGF signaling system in mammals evolved as three gene reduplicated peptides (insulin, IGF-1, and IGF-2) and their two receptors IR and IGF1R that cross-react to variable extents with the peptides, are ubiquitously expressed, and signal through parallel pathways. The major downstream effect of insulin is to regulate glucose uptake and metabolism, while that of the IGF pathways is to regulate growth and cell cycling in part through mTORC1. The GH-IGF-1-IGF1R pathway regulates post-natal growth. IGF-2 signaling is thought to play a major role in regulating intrauterine growth and development, although IGF-2 is also present at high levels in post-natal life. Thus, one would anticipate that reducing IR/IGF1R signaling in any cell would slow growth and cell cycling by reducing growth factor and metabolic mTORC1-mediated and other processes including the splicing of RNA for protein synthesis.

      Thank you for this new extra review and assessing our paper with new suggestions (we addressed the previous suggestions to the satisfaction of other reviewers). Of note -regarding this introduction – the podocyte is a terminally differentiated cell and may have unique responses to insulin / IGF as it is accepted it does not generally proliferate (hence we consider understanding the actions of insulin / IGF and their receptors to be of interest). Indeed, we have recently shown a contrasting effect of IGF signalling in the podocyte. Partial suppression of the IGF1 receptor is beneficial in contrast to near complete suppression that results in mitochondrial dysfunction (PMID:38706850).

      Mouse IR/IGF1R double knockdown model:

      A double knockdown mouse model was generated by interbreeding mice with different genetic backgrounds carrying floxed sites for IR and IGF-1R to produce mixed background offspring with both floxed IR and IGF-1R genes. These mice were crossed so that the podocin promoter driven-Cre (that comes on at about embryonic day 12 bas podocytes are developing) would delete IR and IGF-1R genes. Since podocin is believed to be an absolutely podocyte-specific protein, this podocin promoter this is predicted to specifically knock down the IR and IGF1R genes only in podocytes. The weight and growth of double KO offspring was not different from controls, but some proportion of the double knockdown mice subsequently developed proteinuria by 6 months and 20% died, although no specific data is provided to identify the cause of the deaths since eGFR was not decreased. Surviving mice were evaluated at 6 months of age. The efficacy of knockdown was not demonstrated in the mouse model itself, although a temperature-sensitive cell line developed from these double knockdown mice showed that expression of IR and IGF-1R proteins in the Cre-treated cell line were both reduced by about 50% (no statistical analysis of this result provided).

      In the knockout mice, proteinuria was significantly increased by 6 months, but not at earlier time points. Histologic analysis showed proteinaceous casts, glomerulosclerosis and interstitial fibrosis. Podocyte number was stated to be reduced by about 30% in double knockdown mice, although the method by which this was evaluated seems to have been by counting WT1 positive nuclei in glomerular cross-sections, an approach that is well-known not to be a reliable way of assessing true podocyte number. No information is provided about podocyte size, density or glomerular volume.

      Comment: If IR/IGF1R deletion plays a significant role in normal podocyte function sufficient to cause proteinuria and glomerulosclerosis then the effect of reduced IR and IGF1R protein expression on podocyte function would have been expected to produce a phenotype before 6 months. A more likely scenario to explain the overall result is that deleting the IR and IGF1R genes at about embryonic day12 impacted podocyte development to a variable extent such that some mice developed fewer podocytes per glomerulus than other mice. As mice grow and their glomeruli and glomerular capillary area increases, those mice with fewer podocytes would not be able to completely cover the filtration surface with foot processes and would develop proteinuria and glomerulosclerosis. If reduced podocyte number per glomerulus is the proximate cause of the observed proteinuria, then modulation of the body and kidney growth rate by calorie restriction to slow growth (lower circulating IGF-1 levels) would be expected to be protective, while a high protein high calorie diet (higher circulating IGF-1 levels) or uni-nephrectomy to increase kidney growth rate would be expected to enhance proteinuria and glomerulosclerosis.

      Thank you for these comments. In response to them:

      (1) WT1 as a marker of podocyte number. We agree may not be the most accurate way of precisely measuring podocyte number but is widely accepted in the field (PMID:33655004 / PMID:38542564) and we think convincingly shows fewer podocytes at 6-months.

      (2) Podocyte size and density was not measured. This was not the focus of the paper and the histology obviously showed a significant phenotype in several mice (Figs 1D-F). Of note we did objectively assess a glomeruloscleorosis index (Fig 1D). We took the approach to understand mechanism through non-biased proteomics and phospho-proteomics of conditionally immortalised podocytes in which we had convincingly knocked down the insulin and IGF1 receptors (Figure 2)

      (3) You did not study the mice earlier to ascertain the developmental phenotype. We concede we did not do this but there was no significant proteinuria detected early in the mice so elected not to increase mouse numbers by studying them then (which we consider good practice for reduction, replacement and refinement). We suspect there would have been subtle changes in those mice that had significantly reduced simultaneous IR and IGF1R knockdown. It was precisely because of this that we generated a conditionally immortalised podocyte cell line with robust simultaneous knock-down of both receptors.

      (4) You did not show significant insulin and IGF1 receptor knockdown in the conditionally immortalised cell line (reviewer states it was 50%). We clearly knocked both receptors down (insulin and IGF1R) in the podocyte line by >80% which was highly statistically significant (p<0.00001). Figure 2A. We agree this was crucial (and we made the cell line because of the variability in the mouse model).

      The model as used may be more representative of a variable degree of podocyte depletion than an effect of impaired IR/IGF1R signaling. Therefore, although the phenotype may be ultimately attributable to the IR/IGF1R gene deletions the proteinuria and glomerulosclerotic phenotype itself was probably a consequence of defective podocyte development. Examining podocyte number, size, density and glomerular volume at earlier time points (4 weeks) would help to answer this question. Therefore, a more appropriate title would be "The insulin/IGF axis is critically important (for) normal podocyte development and deployment". In this context the effect of the knockdowns on splicing would make more sense.

      Please see our response (above). We think our final conclusion that in the podocyte the insulin/IGF axis is important for spliceosome activity and control is valid. This is due to our findings (both total and phospho proteomics results) and considering recent other papers showing this axis can rapidly phosphorylate a variety of spliceosome proteins in different cell types (PMID:39939313 / PMID:32888406). All discussed in detail in the manuscript).

      Cell culture studies. A cell line was generated using a temperature sensitive SV40 system that has been previously reported from this laboratory. A detailed analysis is provided to show that double knockout cells exhibited abnormal spliceosome activity. This forms the basis for the conclusion that "The insulin/IGF axis is critically important (for) controlling gene transcription in the podocyte". There are several concerns that weaken this conclusion.

      (1) In the double knockdown cell culture system about 30% of cells were "lost" by 3 days and about 70% of cells were "lost" by 5days. The studies were done at the 3 day time point. It is not clear whether "lost" cells were in the process of dying, stress-induced detachment, or just growing more slowly than control due to reduced IR and IGF-1R signaling. These processes could have impacted splicing in a non-specific way independent of IR/IGF1R signaling itself.

      (2) Can a single cell line derived from the double floxed mice be relied on to provide an unbiased picture of the effect of deleting IR and IGF-1R? Presumably, the transfection and selection process will select for cells that survive thereby including unknown biases, possibly related to spliceosome function. Is a single cell line adequate? These investigators have extensive experience with this type of analysis, but this question is not addressed in the discussion.

      (3) To determine whether the effect is specific to reduced IR/IGFR signaling the deletion of IR and IGF-1R could be corrected by transfecting full length IR and IGF-1R cDNAs into the cells to restore normal IR/IGF1R signaling. If transfected cells with intact IR and IGF-1R expression and activity returns spliceosome activity to normal this would be evidence that receptors themselves play some role in spliceosome activity, as opposed to the downstream effect on growth limitation/stress on the cells.

      (4) Other ways of testing whether the splicing effect is specifically due to reduced IR/IGF-1R signaling would be to (a) block IR and IGF1R receptors using available inhibitors, (b) remove or reduce insulin, IGF-1 and IGF-2 levels in the culture medium, (c) use low glucose and amino acid culture medium to slow growth rate independent of receptor function, (d) or block intra-cellular signaling via the IR and IGF-1R receptors through mTORC1 inhibition using rapamycin or other signaling targets.

      (5) It would be useful to determine whether the cultured cells stressed in other ways (e.g. ischemia, toxins, etc.) also results in the same splicing abnormalities.

      Point 1. 70% cell loss was observed at day 7 (not day 5). We found approximately 20% loss at day 3. We opted to go for this early date hypothesising the key detrimental processes would be clear then. This 3 day time point also ensures there has been enough time to allow for the expression of Cre recombinase, receptor gene excision and degradation of existing endogenous IR/IGF1R following lentiviral transduction. Interestingly we did not find a major “death or apoptosis” signal in our data then but agree it should be considered. We think this is a specific pathway as we have examined several other conditionally immortalised detrimental podocyte cell line previously using proteomics with a much more severe phenotype of cell death (E.g. podocyte GSK3 alpha/beta knockdown) and we detected NO spliceosome signal (PMID:30679422). Furthermore, there are now other podocyte proteomics “stress” studies that have been published in which there is proteinuria and significant cell loss / death that also do not show spliceosome dysfunction. These include studying the detailed proteosomal signature of podocytes stressed with Doxorubicin and Lipopolysaccharide endotoxin LPS in mice (PMID:32047005) and bradykinin stimulation of rat podocytes (PMID:32518694).

      Point 2. Yes, we think it is valuable and reproducible. We generated a podocyte cell line from insulin receptor and IGF1 receptor homozygous floxed cells. Hence there is no selection bias in the cells when generating the line as both receptors are effectively intact. We then temporally “knocked down” the receptors with extrinsic lentiviral Cre.

      Importantly we validated our cell line findings both back in the cells (with Western blotting) and in our transgenic receptor knockdown mice and found evidence of spliceosomal dysregulation (Figure 3E and 3F). Also as discussed above the spliceosome has been identified in other models in the insulin/IGF pathway.

      Point 3. We don’t think the experiment of knocking down the receptors and then reconstituting them would prove this hypothesis. This is because if splicing abnormality was due to generalised cell dysfunction (which we do not think is the case in this situation) then putting the receptors back may simply restore cell health and the spliceosomal function (e.g. it does not prove it is via the receptors). Secondly, the process of transduction with multiple lentiviruses may be inherently stressful to the cell and there may be a high level of extrinsic receptor inserted which may also be confounding/detrimental. Finally, as discussed there are now several lines of evidence describing insulin / IGF signalling to spliceosomal proteins which we consider important (discussed in the paper in detail).

      Point 4. We think modulating the receptors using the Cre-lox approach is the cleanest approach (with fewer off-target effects) to interrogate the insulin / IGF axis. It allows us to differentiate the cells by thermo-switching (which is crucial for this terminally differentiated cell) and then robustly knocking down both receptors simultaneously to investigate mechanism. We agree these supplementary approaches may give some extra information if their limitations (eg off target effects of inhibitors) are also taken into consideration.

      Point 5. They do not. Please see response to point 1 above regarding GSK3, Doxorubicin, LPS and bradykinin challenge.

    1. eLife Assessment

      This study presents a valuable finding relating to how the state of arousal is represented within the superior colliculus (SC), a principal visuo-oculomotor structure. The main conclusion that the SC's neural representation of arousal is segregated from motor related output appears to have solid support by the data. The work will be of interest to sensory, motor and cognitive neuroscientists.

    2. Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

    3. Reviewer #2 (Public review):

      Summary:

      Neurons in motor-related areas have increasingly shown to carry also other, non-motoric signals. This creates a problem of avoidance of interference between the motor and non-motor-related signals. This is a significant problem that likely affects many brain areas. The specific example studied here is interference between saccade-related activity and slow-changing arousal signals in the superior colliculus. The authors identify neuronal activity related to saccades and arousal. Identifying saccade-related activity is straightforward, but arousal-related activity is harder to identify. The authors first identify a potential neuronal correlate of arousal using PCA to identifying a component in the population activity corresponding to slow drift over the recording session. Next, they link this component to arousal by showing that the component is present across different brain areas (SC and PFC), and that it is correlated with pupil size, an external marker of arousal. Having identified an arousal-related component in SC, the authors show next that SC neurons with strong motor-related activity are less strongly affected by this arousal component (both SC and PFC). Lastly, they show that SC population activity pattern related to saccades and pupil size form orthogonal subspaces in the SC population.

      Strengths:

      A great strength of this research is the clear description of the problem, its relationship with the performed analysis and the interpretation of the results. The paper is very well written and easy to follow.

      An additional strength is the use of fairly sophisticated analysis using population activity.

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation, specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on the first revision:

      My main concern with the paper is really two-fold. First, I think it is only incremental and adds next to no useful information about the SC. That might not be a fair criticism and certainly is purely subjective, but it affects the standards that eLife has on significance thresholds for papers. As such, this is an issue the editors should talk about.

      Second, my main concern with the substance of the paper is that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see some behavioral indicators of arousal, such as RT differences, pupil size (the talk about this), or accuracy. The authors first need to describe the objective behavioral indicators of the level of arousal. Using these indices, they need to establish that there are meaningful differences in the level of arousal across the recording session. Having done so, they can proceed to link changes in SC activity with levels of arousal.

      Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'. I hope it is clear why that is premature. The 'slow-drift' fluctuations are presumed to be related to arousal, but they could be meaningless random fluctuations, or related to some other cognitive process.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

      Comments on the latest version:

      They have constructively responded to my concerns. I think 'incomplete' should be replaced with 'solidly supported'.

    4. Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is high if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity.

      Comments on revisions:

      The authors have given due consideration to the possibility that SC signaling of arousal could be at least in part due to changes in pupil size related responses to ambient light. Discussion of this point in the most recent revision helps to mitigate this concern.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Johnston and Smith used linear electrode arrays to record from small populations of neurons in the superior colliculus (SC) of monkeys performing a memory-guided saccade (MGS) task. Dimensionality reduction (PCA) was used to reveal low-dimensional subspaces of population activity reflecting the slow drift of neuronal signals during the delay period across a recording session (similar to what they reported for parts of cortex: Cowley et al., 2020). This SC drift was correlated with a similar slow-drift subspace recorded from the prefrontal cortex, and both slow-drift subspaces tended to be associated with changes in arousal (pupil size). These relationships were driven primarily by neurons in superficial layers of the SC, where saccade sensitivity/selectivity is typically reduced. Accordingly, delay-period modulations of both spiking activity and pupil size were independent of saccade-related activity, which was most prevalent in deeper layers of the SC. The authors suggest that these findings provide evidence of a separation of arousal- and motor-related signals. The analysis techniques expand upon the group's previous work and provides useful insight into the power of large-scale neural recordings paired with dimensionality reduction. This is particularly important with the advent of recording technologies which allow for the measurement of spiking activity across hundreds of neurons simultaneously. Together, these results provide a useful framework for comparing how different populations encode signals related to cognition, arousal, and motor output in potentially different subspaces.

      Comments on revised manuscript:

      The authors have done a very good job of responding to all of the reviewers' concerns.

      No weaknesses to address.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The greatest weakness in the present research is the fact that arousal is a functionally less important non-motoric variable. The authors themself introduce the problem with a discussion of attention, which is without any doubt the most important cognitive process that needs to be functionally isolated from oculomotor processes. Given this introduction, one cannot help but wonder, why the authors did not design an experiment, in which spatial attention and oculomotor control are differentiated. Absent such an experiment, the authors should spend more time on explaining the importance of arousal and how it could interfere with oculomotor behavior.

      (2) In this context, it is particularly puzzling that one actually would expect effects of arousal on oculomotor behavior. Specifically, saccade reaction time, accuracy, and speed could be influenced by arousal. The authors should include an analysis of such effects. They should also discuss the absence or presence of such effects and how they affect their other results.

      (3) The authors use the analysis shown in Figure 6D to argue that across recording sessions the activity components capturing variance in pupil size and saccade tuning are uncorrelated. however, the distribution (green) seems to be non-uniform with a peak at very low and very high correlation specifically. The authors should test if such an interpretation is correct. If yes, where are the low and high correlations respectively? Are there potentially two functional areas in SC?

      Comments on revised manuscript:

      I remain somewhat concerned that the authors jump immediately into an analysis of the 'arousal-related' effects on SC activity. Before that, I would like to see a more detailed discussion justifying the use pupil size alone (i.e., w/o other indicators such as RT) as indicative of fluctuations in general arousal that are causal to concomitant changes in SC activity. Instead, in its current form, the authors find changes in SC activity and describe them immediately as 'arousal-related'.

      Other than this conceptual issue, I do not have major problems with the analysis per se.

      We agree with the reviewer that we may have advanced into discussing arousal-related effects in the previous version of the manuscript without providing a thorough explanation for why we think the slow drift axis is associated with changes in the monkey’s arousal levels. Arousal has been linked to the size of the pupil as well as movements of the eyes in numerous previous studies. We have made the following changes in the revised manuscript to address the reviewer’s concern:

      (1) When first describing how the spiking responses of SC neurons fluctuate over the course of a recording session (Lines 130-132), we have used the phrase "slow fluctuations in the spiking responses" rather than "arousal-related fluctuations in the spiking responses". Then, when describing these effects in more detail (Lines 136-147), we have explained why we think these fluctuations may be related to arousal. The following text has been added in the revised manuscript for clarification:

      “We found that this low-dimensional pattern of activity in the SC was also correlated with pupil size in the present study and with simultaneously recorded data in the prefrontal cortex (PFC), pointing to a link between this brain-wide fluctuation and changes in the monkeys’ arousal levels while performing the task.” (Lines 136-147)

      (2) We have changed the subheading in Line 183 of the revised manuscript from "Arousal-related fluctuations are present in the SC and correlated with pupil size and fluctuations in PFC activity" to "Slow fluctuations in SC spiking activity are correlated with pupil size and PFC activity". Given that we have not yet explained the results linking these fluctuations to arousal at this stage of the manuscript, we believe that this revised title is more accurate and avoids jumping too quickly to arousal-related fluctuations without first explaining the link between SC slow drift, pupil size and PFC activity.

      (3) We have provided additional justification for using pupil size and PFC activity to assess whether SC slow drift is associated with changes in the monkeys’ arousal levels. In a previous study, we computed an identical slow drift axis for spiking responses in visual cortex (V4) and PFC, and investigated how these low-dimensional neural activity patterns, which were themselves strongly correlated, were associated with various eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). Results showed that pupil size was the strongest predictor of slow drift in V4 and PFC. Given that the eye metrics were also strongly correlated with each other, we believe that the observed relationship between SC slow drift, pupil size and PFC activity provides sufficient evidence to suggest that the fluctuations observed in the SC are arousal-related. The following text has been added to the Results section of the revised manuscript:

      “Moreover, previous work in our laboratory computed a similar slow-drift axis using spiking activity in visual cortex (V4) and PFC, and investigated the relationship between these low-dimensional neural activity patterns and different eye-related metrics (e.g., pupil size, microsaccade rate, reaction time, saccade velocity). In addition to observing a strong correlation between V4 and PFC slow drift, we found that, relative to the other eye-related metrics, pupil size was the strongest predictor of these fluctuations (Johnston et al., 2022a). Thus, to further confirm the link between the SC slow drift axis and changes in the monkeys’ arousal levels while they performed the MGS task, we next sought to explore if projections onto the SC slow drift axis were associated with pupil size.” (Lines 236-344)

      Reviewer #3 (Public review):

      Summary:

      This study looked at slow changes in neuronal activity (on the order of minutes to hours) in the superior colliculus (SC) and prefrontal cortex (PFC) of two monkeys. They found that SC activity shows slow drift in neuronal activity like in the cortex. They then computed a motor index in SC neurons. By definition, this index is low if the neuron has stronger visual responses than motor response, and it is low if the neuron has weaker visual responses and stronger motor responses. The authors found that the slow drift in neuronal activity was more prevalent in the low motor index SC neurons and less prevalent in the high motor index neurons. In addition, the authors measured pupil diameter and found it to correlate with slow drifts in neuronal activity, but only in the neurons with lower motor index of the SC. They concluded that arousal signals affecting slow drifts in neuronal modulations are brain-wide. They also concluded that these signals are not present in the deepest SC layers, and they interpreted this to mean that this minimizes the impact of arousal on unwanted eye movements.

      Strengths:

      The paper is clear and well-written.

      Showing slow drifts in the SC activity is important to demonstrate that cortical slow drifts could be brain-wide.

      Weaknesses:

      The authors find that the SC cells with the low motor index are modulated by pupil diameter. However, this could be completely independent of an "arousal signal". These cells have substantial visual sensitivity. If the pupil diameter changes, then their activity should be influenced since the monkey is watching a luminous display. So, in this regard, the fact that they do not see "an arousal signal" in the most motor neurons (through the pupil diameter analyses) is not evidence that the arousal signal is filtered out from the motor neurons. It could simply be that these neurons simply do not get affected by the pupil diameter because they do not have visual sensitivity. So, even with the pupil data, it is still a bit tricky for me to interpret that arousal signals are excluded from the "output layers" of the SC.

      Of course, the general conclusion is that the motor neurons will not have the arousal signal. It's just the interpretation that is different in the sense that the lack of the arousal signal is due to a lack of visual sensitivity in the motor neurons.

      I think that it is important to consider the alternative caveat of different amounts of light entering the system. Changes in light level caused by pupil diameter variations can be quite large. Please also note that I do not mean the luminance transient associated with the target onset. I mean the luminance of the gray display. it is a source of light. if the pupil diameter changes, then the amount of light entering to the visually sensitive neurons also changes.

      Comments on revised manuscript:

      The authors have addressed my first primary comment. For the light comment, I'm still not sure they addressed it. At the very least, they should explicitly state the possibility that the amount of light entering from the gray background can matter greatly, and it is not resolved by simply changing the analysis interval to the baseline pre-stimulus epoch. I provide more clear details below:

      In line 194 of the redlined version of the article (in the Introduction), the citation to Baumann et al., PNAS, 2023 is missing near the citation of Jagadisan and Gandhi, 2022. Besides replicating Jagadisan and Gandhi, 2022, this other study actually showed that the subspaces for the visual and motor epochs are orthogonal to each other

      We thank the reviewer for this comment and apologize that the citation to Baumann et al., PNAS, 2023 was missing in the previous version of the manuscript. In addition to including this citation in the revised version, we have provided a much more comprehensive description of all three cited studies and clarified that, in addition to replicating the results of Jagadisan and Gandhi, Baumann et al., PNAS, 2023 showed that the subspaces for the visual and motor epochs are orthogonal to each other. The following lines have been added to the Introduction of the revised manuscript:

      “A similar separation has been observed for visual and motor responses in the SC (Jagadisan and Gandhi, 2022; Ayar et al., 2023; Baumann et al., 2023). For example, Jagadisan and Gandhi (2022) used linear microelectrode arrays to investigate why early eye movements are not triggered when neuronal responses to a visual target, presented before a delayed saccade to that target, cross a threshold. They found that population activity in the SC was less stable during the visual epoch of a delayed saccade task, relative to the saccade epoch. Moreover, saccades could be evoked more easily by patterned microstimulation when the temporal structure of the microstimulation was stable across electrodes, providing a potential explanation for how downstream regions differentiate between visual and motor responses. Similar results were reported by Baumann et al. (2023) who found that the strength of SC motor responses during a saccade to a visual image depends on the features of that image (e.g., contrast, orientation). When dimensionality reduction was applied to the spiking responses of neuronal populations in the SC, the population trajectory during the initial visual response to the image was orthogonal to that during the motor response. These findings replicate the separation in temporal population structure reported by Jagadisan and Gandhi (2022) and support the results of Ayar et al. (2023). They found that, although not completely orthogonal, population activity in the SC is distinct for visual and motor responses during the same oculomotor task and across different tasks, which could further facilitate the decoding of signals related to sensation, action and context by downstream regions.” (Lines 110-127)

      Line 683 (and around) of the redlined version of the article (in the Results): I'm very confused here. When I mentioned visual modulation by changed pupil diameter, I did not mean the transient changes associated with the brief onset of the cue in the memory-guided saccade task. I meant the gray background of the display itself. This is a strong source of light. If the pupil diameter changes across trials, then the amount of light entering the eye also changes from the gray background. Thus, visually-responsive neurons will have different amount of light driving them. This will also happen in the baseline interval containing only a fixation spot. The arguments made by the authors here do not address this point at all. So, please modify the text to explicitly state the possibility that the global luminance of the display (as filtered by the pupil diameter) alters the amount of light driving the visually-responsive neurons and could contribute to the higher effects seen in the more visual neurons.

      We apologize that our analysis did not fully address the reviewer’s concern that the presence of fluctuations in visual neurons and their absence in motor neurons may have arisen indirectly due to changes in the amount of light entering the eye caused by changes in pupil size. As per the reviewer’s suggestion, we have now raised the possibility that visual neurons in the SC may have firing rates that are monotonically related to slow trends in overall luminance induced by pupil size changes, whereas motor neurons do not. Although we believe this to be an unlikely explanation, the paragraph from lines 374-398 has been modified to better describe this possibility, including the following text:

      “Given that slow drift is found in traditionally defined visual areas (e.g., area V4) and in regions that show mixed selectivity for multiple task variables (e.g., PFC) (Cowley et al., 2020), it seems unlikely that slow drift is caused by luminance fluctuations alone and more likely that it reflects global changes in arousal. At the same time, these arousal-related fluctuations covary with changes in pupil size (Johnston et al., 2022a), which could modulate the amount of light entering the eye from the display. This might affect visual neurons but not motor neurons due to their lack of visual sensitivity. Because SC neurons exist on a continuum, with visual responses decreasing and motor responses increasing from the intermediate to deep layers (Massot et al., 2019; Heusser et al., 2022) and no clear categorical boundary for motor-only neurons, any readout strategy would still need to avoid corruption of the motor output by slow drift, even if it were caused by changes in the amount of light entering the eye.” (Lines 387-398)

      The figures (everywhere, including the responses to reviewers) are very low resolution and all equations in methods are missing.

      We thank the reviewer for bringing this to our attention. We believe this issue may have arisen during conversion of the manuscript file for review, as the figures were of sufficient quality and the equations visible in the version that appeared online (https://doi.org/10.7554/eLife.99278.2). In any case, we will ensure that high-resolution figures are submitted with the revised manuscript and apologize that they were low resolution in the previous version.

      I'm very confused by Fig. 2 - supplement 2. Panel B shows a firing rate burst aligned to *microsaccade* onset. Does that mean you were in the foveal SC? i.e. how can neurons have a motor burst to the target of the memory-guided saccade and also for microsaccades? And which microsaccade directions caused such a burst? And what does it mean to compute the motor index and spike count for microsaccades in panel C? if you were in the proper SC location for the saccade target, then shouldn't you *not* get any microsaccade-related burst at all? This is very confusing to me and needs to be clarified

      We agree that clarification is needed here and thank the reviewer for their comment. The eccentricity of the targets was set to match the endpoints of the evoked saccades, which for some sessions were relatively close to the fovea. The mean eccentricity of the targets across sessions was 4.52° (SD = 2.89°). These values are now reported in the Methods section of the revised manuscript (Line 637). For the neuron shown in Figure 2–figure supplement 2, the eccentricity of the targets was 3°. Previous research has shown that some SC neurons respond during microsaccades as well as slightly larger saccades (see Hafed & Krauzlis, 2012, J. Neurophysiol., Fig. 4B). This likely explains why the neuron shown in Figure 2–figure supplement 2, which had a receptive field at ~3° based on saccades evoked by microstimulation, also responded during microsaccades. We apologize that this was not explained in the previous version and agree that it could have been confusing for the reader. To address this, the legend for this supplementary figure has been edited in the revised version and now reads:

      “(B) PSTH for an SC neuron that responded around the time of a microsaccade. Firing rates were computed in 1ms bins, averaged across trials and smoothed using a Gaussian function (σ = 5ms). Note that the targets were set to 3º in this session based on saccades evoked by microstimulation (see Methods). Previous research has shown that some SC neurons respond during microsaccades as well as to slightly larger saccades (Hafed and Krauzlis, 2012). This likely explains why this SC neuron, which had a RF at ~3º based on saccades evoked by microstimulation, also responded around the time of a microsaccade.” (Lines 1026-1031)

    1. eLife Assessment

      This study explores how exogenous attention operates at the finest spatial scale of vision, within the foveola - a topic that has not been previously explored but is of interest to visual neuroscientists. The question is important for understanding how attention shapes perception, and how it differs between the periphery and the central regions of highest visual acuity. The evidence indicating that attention near the fovea preferentially enhances low spatial frequencies is compelling, as shown by carefully designed experiments with state-of-the-art eye tracking to monitor attended locations just a few tens of minutes of arc away from the fixation target.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the weaknesses noted above, which were raised in the previous round of review.]

      Summary:

      The manuscript investigates how exogenous attention modulates spatial frequency sensitivity within the foveola. Using high-precision eye-tracking and gaze-contingent stimulus control, the authors show that exogenous attention selectively improves contrast sensitivity for low- to mid-range spatial frequencies (4-8 cycles/degree), but not for higher frequencies (12-20 CPD). In contrast, improvements in asymptotic performance at the highest contrast levels occur across all spatial frequencies. These results suggest that, even within the foveola, exogenous attention operates through a mechanism similar to that observed in peripheral vision, preferentially enhancing lower spatial frequencies.

      Strengths:

      The study shows strong methodological rigor. Eye position was carefully controlled, and the stimulus generation and calibration were highly precise. The authors also situate their work well within the existing literature, providing a clear rationale for examining the fine-grained effects of exogenous attention within the foveola. The combination of high spatial precision, gaze-contingent presentation, and detailed modeling makes this a valuable technical contribution.

      Weaknesses:

      The manipulation of attention raises some interpretive concerns. Clarifying this issue, together with additional detail about statistics, participant profiles, other methodological elements, and further discussion in relation to oculomotor control in general, could broaden the impact of the findings.

    3. Reviewer #2 (Public review):

      Summary:

      This study aims to test whether foveal and non-foveal vision share the same mechanisms for endogenous attention. Specifically, they aim to test whether they can replicate at the foveola previous results regarding the effects of exogenous attention for different spatial frequencies.

      Strengths:

      Monitoring the exact place where the gaze is located at this scale requires very precise eye-tracking methods and accurate and stable calibration. This study uses state-of-the-art methods to achieve this goal. The study builds on many other studies that show similarities between foveal vision and non-foveal vision, adding more data supporting this parallel.

      Weaknesses:

      The study lacks a discussion of the strength of the effect and how it relates to previous studies done away from the fovea. It would be valuable to know if not just the range of frequencies, but the size of the effect is also comparable.

    4. Reviewer #3 (Public review):

      Summary:

      This paper explores how spatial attention affects foveal information processing across different spatial frequencies. The results indicate that exogenously directed attention enhances contrast sensitivity for low- to mid-range spatial frequencies (4-8 CPD), with no significant benefits for higher spatial frequencies (12-20 CPD). However, asymptotic performance increased as a result of spatial attention independently of spatial frequency.

      Strengths:

      The strengths of this article lie in its methodological approach, which combines a psychophysical experiment with precise control over the information presented in the foveola.

      Weaknesses:

      The authors acknowledge that they used the standard approach of analyzing observer-averaged data, but recognize that this method has limitations: it ignores the uncertainty associated with parameter estimates and the relationships between different parameters of the psychometric model. This may affect the interpretation of attentional effects. In the future, mixed-effects models at the trial level could overcome these limitations.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study explores how exogenous attention operates at the finest spatial scale of vision, within the foveola - a topic that has not been previously explored. The question is important for understanding how attention shapes perception, and how it differs between the periphery and the central regions of highest visual acuity. The evidence is compelling, as shown by carefully designed experiments with state-of-the-art eye tracking to monitor attended locations just a few tens of minutes of arc away from the fixation target, but additional clarification regarding analyses and implications for vision and oculomotor control would broaden the impact of the study.

      We thank the editors and reviewers for their thorough evaluation of our work. We have carefully revised the manuscript and substantially reworked the Discussion to address all of the points raised, eliminate redundancies, streamline the text, and clarify the implications of our findings for vision and oculomotor control. We have also expanded the documentation of our power analyses and conducted the additional analyses requested by the reviewers. Our point-by-point responses are provided.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript investigates how exogenous attention modulates spatial frequency sensitivity within the foveola. Using high-precision eye-tracking and gaze-contingent stimulus control, the authors show that exogenous attention selectively improves contrast sensitivity for low- to midrange spatial frequencies (4-8 cycles/degree), but not for higher frequencies (12-20 CPD). In contrast, improvements in asymptotic performance at the highest contrast levels occur across all spatial frequencies. These results suggest that, even within the foveola, exogenous attention operates through a mechanism similar to that observed in peripheral vision, preferentially enhancing lower spatial frequencies.

      Strengths:

      The study shows strong methodological rigor. Eye position was carefully controlled, and the stimulus generation and calibration were highly precise. The authors also situate their work well within the existing literature, providing a clear rationale for examining the fine-grained effects of exogenous attention within the foveola. The combination of high spatial precision, gazecontingent presentation, and detailed modeling makes this a valuable technical contribution.

      Weaknesses:

      The manipulation of attention raises some interpretive concerns. Clarifying this issue, together with additional detail about statistics, participant profiles, other methodological elements, and further discussion in relation to oculomotor control in general, could broaden the impact of the findings.

      We thank the reviewer for the helpful comments. In the Discussion, we have now considered additional factors that could have contributed to the observed attentional effects. First, the exogenous cue might have functioned as a temporal warning signal. However, the interval between cue and stimulus onset was fixed across trials, meaning that the cue did not provide temporal information beyond what participants could already anticipate. Furthermore, participants completed a large number of trials (≥ 4000), making it highly likely that the temporal relationship between trial onset and target onset was overlearned. These considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions.

      Another possibility is that the 100% validity of the exogenous cue could potentially have promoted endogenous attentional engagement. Yet, several characteristics of our task strongly limited the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to the observed attentional benefits in our task.

      Regarding the points on statistical reporting and participant details, we followed the reviewer’s suggestions by adding post hoc power analyses and providing more comprehensive reporting of the linear model outputs (see Appendices 1 and 2). We also expanded the description of the training procedures conducted with participants prior to formal data collection in the Methods section.

      We appreciate the reviewer for raising the important question of how our findings may relate to oculomotor control. To address this, we analyzed trials excluded from the manuscript due to saccades. This analysis revealed that saccade latencies were shorter in the valid condition than in the neutral condition (see Figure 2 — Supplementary Figure 2). This earlier saccade onset may reflect exogenously triggered preparatory activity in the oculomotor system in response to the salient cue. Future studies are needed to examine whether this preparatory mechanism serves to efficiently guide microsaccades or saccades toward behaviorally relevant stimuli in everyday vision. We have incorporated this point into the Discussion, highlighting a potential mechanistic link between exogenous attention and oculomotor behavior.

      Reviewer #2 (Public review):

      Summary:

      This study aims to test whether foveal and non-foveal vision share the same mechanisms for endogenous attention. Specifically, they aim to test whether they can replicate at the foveola previous results regarding the effects of exogenous attention for different spatial frequencies.

      Strengths:

      Monitoring the exact place where the gaze is located at this scale requires very precise eyetracking methods and accurate and stable calibration. This study uses state-of-the-art methods to achieve this goal. The study builds on many other studies that show similarities between foveal vision and non-foveal vision, adding more data supporting this parallel.

      Weaknesses:

      The study lacks a discussion of the strength of the effect and how it relates to previous studies done away from the fovea. It would be valuable to know if not just the range of frequencies, but the size of the effect is also comparable.

      We thank the reviewer for raising these important issues. In response, we have expanded the Discussion to link our findings to prior work. First, we included a direct comparison of our effect sizes with those reported in previous studies. This analysis revealed that our effect sizes are highly comparable to those earlier studies (see Figure 3 — Supplementary Figure 4). Second, we contextualized our findings within the popular framework of normalization model of attention in the Discussion. We detected a mixture of contrast and response gain effects, consistent with predictions from the normalization framework given our experimental design. Finally, we extended the Discussion to consider potential underlying neural mechanisms. Specifically, we suggested that differences in attentional modulation, particularly the manifestation in response gain vs. contrast gain between the fovea and extrafovea, may reflect distinct characteristics of foveal neurons relative to those in extrafoveal regions.

      Reviewer #3 (Public review):

      Summary:

      This paper explores how spatial attention affects foveal information processing across different spatial frequencies. The results indicate that exogenously directed attention enhances contrast sensitivity for low- to mid-range spatial frequencies (4-8 CPD), with no significant benefits for higher spatial frequencies (12-20 CPD). However, asymptotic performance increased as a result of spatial attention independently of spatial frequency.

      Strengths:

      The strengths of this article lie in its methodological approach, which combines a psychophysical experiment with precise control over the information presented in the foveola.

      Weaknesses:

      The authors acknowledge that they used the standard approach of analyzing observeraveraged data, but recognize that this method has limitations: it ignores the uncertainty associated with parameter estimates and the relationships between different parameters of the psychometric model. This may affect the interpretation of attentional effects. In the future, mixed-effects models at the trial level could overcome these limitations.

      We thank the reviewer for this comment. Our Methods section continues to transparently discuss these limitations, as well as the fact that these limitations are shared with most published studies in psychophysics. Additionally, we now include measures of uncertainty for all key effects (see Appendices 1 and 2), and we have reported effect sizes throughout the Results section. Finally, we have added post hoc power analyses to the Methods. Following previous approaches to power calculation for related experiments, we found that our study was sufficiently powered to detect the main effect of attention and had moderate power to detect the interaction between attention and spatial frequency.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The manipulation of attention raises some interpretive concerns. Since only valid and neutral cue conditions were included, the results might reflect differences in temporal predictability rather than true spatial reorienting of attention. In other words, the valid cue could act mainly as a temporal warning signal that reduces uncertainty about stimulus onset. Without invalid trials or a non-predictive control cue, it remains difficult to separate spatial and temporal contributions to exogenous attention.

      We thank the reviewer for raising this point. In this regard, we would like to clarify that there was no temporal uncertainty in stimulus onset: across all conditions and trial types, the stimulus was presented at the same time relative to the start of the trial, i.e., 600 ms after the start. Yet, we acknowledge that the shorter temporal proximity between the cue and stimulus in valid trials could serve as an additional temporal warning signal, potentially conferring an advantage relative to the neutral condition. While we cannot completely rule out a contribution of such temporal cueing within the constraints of the current experimental design, we believe its impact was limited. Specifically, the fixed cue-stimulus interval reduced the cue’s ability to convey additional temporal information. Furthermore, observers completed a large number of trials (≥4000), and the temporal contingency between trial onset and target onset was likely overlearned. Taken together, these considerations indicate that the observed benefit in the valid condition was predominantly attributable to spatial reorienting induced by the cue, rather than to differences in the temporal predictability of the target across conditions. We now mention this in the revised Discussion (lines 309-318).

      We recognized that the original Figure 2 illustrating the experimental paradigm may have caused confusion regarding the timing structure of the task. We have therefore updated the figure to more explicitly illustrate the trial timeline in both conditions.

      (2) The reported effects seem small, and no power analysis is provided. With only seven participants, the study may not have enough statistical power to confirm that the observed differences are reliable or generalizable. Although the technical precision in gaze and stimulus control is impressive, it cannot offset the limitations of a small sample. The authors should include effect size estimates, confidence intervals, and ideally a post-hoc power analysis.

      The statistical results are reported only as χ² values from model comparisons, which do not show the direction or size of the effects. For clarity and transparency, these tests should be accompanied by fixed-effect estimates with their standard errors and confidence intervals, so readers can better assess both the reliability and perceptual relevance of the findings.

      The reviewer raised several important points regarding the study's statistical rigor.

      In the revised manuscript, we now report effect size estimates (Cohen’s d) in the Results section and Appendices. Effect sizes were in the medium-to-large range, including the effect of attention on contrast sensitivity at 4 and 8 CPD, and the difference in attentional benefit on contrast sensitivity between 4 and 12 CPD and between 8 and 12 CPD. We have also included the full model outputs, including standard errors and confidence intervals, in the Appendices.

      The sample size for the current study was determined based on the magnitude of the attentional effects observed in our previous work (Guzhang et al., 2021). The experimental design and dependent measures were highly similar across the two studies, and the prior study revealed a robust effect, which accounted for a substantial proportion of within-observer variance in a tightly controlled repeated-measures design.

      We have revised the manuscript, adding bootstrap-based power estimates, following the procedure described by Jigo and Carrasco (2020), using data from Guzhang et al. (2021). Assuming the effect size in our current study would be comparable to the prior one, 2 to 12 observers were randomly sampled with replacement, and a one-way repeated-measures ANOVA with attention as the main factor was used. This procedure was repeated 10,000 times, and power was estimated as the proportion of iterations yielding a significant main effect for each sample size. The results of this analysis indicate that a sample size of five observers would have been sufficient to achieve approximately 80% power to detect the main effect of attention in the prior study. Based on these estimates, the sample size used in the current study (seven observers) is adequately powered.

      We also conducted a post hoc power analysis to evaluate the power of our design to detect the main effects and their interaction. It was performed using the R package simr, which estimates statistical power for mixed-effects models through model-based simulation. Specifically, simr generated datasets based on the fixed- and random-effect structure of the fitted model, preserving the observed effect sizes and variance components. For each simulated dataset, the model was refit, and the effect of interest was tested. By repeating this procedure 501 times across different sample sizes, power was estimated as the proportion of simulations in which the effect was statistically significant. Based on these post hoc simulations, we estimated that our study had high power (>95%) to detect the main effects and moderate power (>65%) to detect the interaction. Although the estimated power for the interaction was lower than for the main effects, the observed effect size was substantial (as indexed by Cohen’s d), indicating that the interaction was not trivially small.

      We now describe these analyses in lines 501-532 in the Methods section.

      (3) The task seems quite demanding, requiring fine spatial discrimination, very small stimuli, and head stabilization with a bite bar. It is not clear whether participants were naïve or experienced observers. If they had prior psychophysical training, practice effects could have influenced the results, particularly given the lack of invalid trials. The manuscript would benefit from clarifying participants' experience level and describing any training or familiarization procedures.

      We appreciate the reviewer’s concern regarding potential training effects. All observers had prior experience with similar tasks, but were naïve to the scope of this study. Each participant underwent an initial familiarization phase of approximately 50 trials with the experimental setup of this study. They then completed an additional ~50 trials to estimate their individual contrast thresholds per spatial frequency level before we proceeded with data collection at the five predefined contrast levels.

      Based on our experience, we have found that, for experiments similar to the one described here, observers quickly adapt to the setup and are generally able to maintain reliable fixation and stable performance, even during the initial training phase. In addition, each participant completed approximately 400 trials before the data collection started. Even observers who began the session with no prior experience would have become practiced with the setup by the time the actual data-collection phase started, during which ~4000 trials were collected per observer. Therefore, whether an observer participated in previous experiments is unlikely to meaningfully affect the results, as the large number of trials ensures comparable levels of task familiarity across individuals.

      Crucially, valid and neutral trials were interleaved throughout the session. Any general learning or practice would therefore influence both conditions equally. Despite this, we still observed clear performance improvements in the valid condition relative to the neutral condition, indicating that the observed benefits cannot be attributed solely to practice and reflect an attentional enhancement. We have added elaboration on the training procedures in Methods (lines 411-429).

      Finally, we recognize that the lack of invalid trials may raise concerns given our 100% spatially predictive cue, as noted in Reviewer 3’s first comment. We refer the reader to our response to that point for a more detailed discussion of cue validity and the distinction between exogenous and endogenous influences in our paradigm.

      (4) The study would benefit from a clearer connection between the behavioral results and possible underlying neural mechanisms. How might the observed changes in contrast sensitivity relate to known physiological processes at the retinal, thalamic, or cortical level? The discussion could be strengthened by framing the findings within established models of attentional modulation or by referring to known effects of attention in the early visual cortex.

      This is an important point, and we agree that framing the findings within established models of attentional modulation can strengthen the discussion. We believe that the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) offers a useful framework for interpreting our behavioral findings, especially the attention-related changes in contrast sensitivity and asymptotic performance observed at the foveal scale. We have now added a more detailed discussion linking our results to this model and considering, explicitly as speculation, how known physiological processes at different stages may contribute to the observed effects in Discussion (lines 264-307).

      (5) The ecological relevance of the results is not fully developed. The authors propose that the observed effects may resemble natural attentional shifts triggered by salient events, yet the brief, highly localized flashes used here are somewhat artificial. A more likely interpretation is that these mechanisms relate to oculomotor control within the fovea, perhaps reflecting preparatory activity for microsaccades or fine fixation adjustments. Considering this view could broaden the impact of the findings and link them to current discussions on the relationship between attention and oculomotor control.

      We thank the reviewer for raising this important point regarding the ecological relevance of our findings, which we did not sufficiently address in the original manuscript. Although we briefly motivated scenarios that engage exogenous attention at high spatial resolution, such as detecting road signs or traffic lights at a distance while driving, we did not fully elaborate on how such attentional processes may link to downstream visual and oculomotor functions.

      In our experiment, observers maintained fixation and avoided saccades throughout the trial. Nevertheless, in a subset of trials (on average 17% ± 3%), observers made saccades after stimuli disappeared and prior to providing a response. Typically, these movements were microsaccades with amplitudes smaller than 0.5°, directed toward the target location, in both valid and neutral trials. These saccades were discarded prior to the analyses performed in the manuscript. Inspired by the reviewer’s feedback, we decided to examine the saccade latency in these trials relative to the onset of the response cue to assess whether exogenous cueing influenced oculomotor timing. Notably, we observed an earlier onset of microsaccades in valid compared to neutral trials (71 ms ± 50 ms faster, P < 0.01). We have now added this observation as Figure 2 — Supplementary Figure 2 in the manuscript. Because the presence of an exogenous pre-cue was the only difference between the two trial types, the earlier microsaccade onset likely reflects exogenously triggered preparatory activity in the oculomotor system in response to the salient pre-cue. Such fine-grained attention may prime potential eye movements toward behaviorally relevant stimuli for further examination. This interpretation is consistent with the reviewer’s suggestion and supports a mechanistic link between exogenous attention and oculomotor behavior, extending the ecological relevance of our findings. This point has been added to the Discussion on lines 329 to 340.

      We also conducted analysis to examine ocular drift behavior following the response cue. Although trials included in the manuscript analyses were constrained such that fixation during target presentation remained within a small window (10’ radius) around the fixation marker, we did not assess whether gaze subsequently drifted closer to the target location after the response cue. One possibility is that exogenous attention might bias ocular drift, shifting the preferred locus of fixation closer to the target. To address this, we computed the average Euclidean distance between gaze position and the target location following response cue onset for valid and neutral trials. However, we found no significant difference in gaze-target distance between valid and neutral trials (p = 0.57).

      Although the spatial cueing approach has long been used to probe exogenous attention in a controlled manner in psychophysical experiments, we fully recognize the importance of understanding attention under more naturalistic viewing conditions that allow observers to freely move their eyes. Developing paradigms that incorporate more naturalistic, salient stimuli would be an important direction for future work, enabling investigation of exogenous attention in ecologically valid settings and its influence on sequential actions and processes, including oculomotor behavior.

      (6) There is no statement about the availability of the data and code used for the experiment.

      We have now added the data and code for the analysis pipeline to the Open Science Framework (OSF).

      Reviewer #2 (Recommendations for the authors):

      (1) The study could discuss the strength of the effect and how it relates to previous studies.

      We thank the reviewer for raising this point. To facilitate direct comparison with the study by Jigo and Carrasco (2020), we computed attentional benefit as the ratio of contrast sensitivity between the valid and neutral conditions (now shown in Figure 3 — Supplementary Figure 4). In their data, the attentional benefit at 0° eccentricity peaked just below 4 CPD, with a ratio of approximately 1.2, corresponding to a ~20% increase in contrast sensitivity. This magnitude closely matches the benefit we observed for fine-grained attentional shifts within the foveola at spatial frequencies between 4 and 8 CPD (17% ± 12% and 16% ± 14% for 4 and 8 CPD, respectively). We have added this comparison to the Discussion (lines 246-262).

      In addition, we acknowledge that prior studies have reported heterogeneous attentional effects, including pure contrast gain, pure response gain, or a mixture of the two. We now explicitly reference these findings in the Discussion and use the normalization model of attention (Reynolds and Heeger, 2009; Herrmann et al., 2010) to account for how differences in stimulus configuration, attention field size, and eccentricity may account for discrepancies between our findings and prior studies examining attention in the extrafovea or when broadly distributed across the fovea (lines 264-307).

      (2) Minor details:

      (a) The abstract mentions gaze-contingent-display, but if I understand correctly, the stimulus was not presented in a gaze-contingent manner.

      That’s correct. Although stimuli were not presented gaze-contingently, we used a gaze-contingent calibration procedure (see Methods, lines 386-389) to achieve higher precision in localizing the line of sight. This increased accuracy was essential for selecting trials in which stimuli remained at the intended eccentricity relative to the preferred locus of fixation. To avoid potential confusion, however, we have removed this detail from the abstract.

      (b) Line 361: What is the manual calibration the authors are referring to? It does not appear to be described.

      The text has been updated to explain more explicitly what auto and manual calibrations are.

      (c) Line 402: There may be a typo towards the end of the line "t0" should be "to"?

      Text has been updated. Thank you.

      (d) Line 405. What are the units of 30?

      It’s in arcminutes. Text has been updated.

      Reviewer #3 (Recommendations for the authors):

      I found this paper very interesting, with a solid methodological approach and excellent data analyses. The authors present a well-designed psychophysical study that contributes valuable insights into the mechanisms of attention in the foveola. The methodology is rigorous, and the analyses are thoughtfully conducted and clearly presented.

      That said, I would like to offer a few comments and suggestions for clarification and further consideration:

      (1) Exogenous attention:

      If a 100% spatially predictive cue is compared to a neutral cue, the observed attentional effect should not be described as (purely) exogenous, since the cue fully predicts where the post-cue will request a response. This situation represents a case in which attention is exogenously driven but endogenously maintained (see e.g., Chica et al., 2013, Behavioural Brain Research). I recommend clarifying this distinction in the manuscript (and title) to avoid conceptual ambiguity.

      We thank the reviewer for raising this important conceptual point. We agree that because the pre-cue was 100% spatially predictive, the resulting attentional allocation cannot be considered purely exogenous. Although the abrupt, salient onset of the cue obligatorily triggers an exogenous shift of attention, its validity could also promote endogenous maintenance of attention at the cued location. Yet, several characteristics of our task strongly limit the extent to which such endogenous engagement could meaningfully influence performance. Endogenous attentional benefits typically emerge only after ~150-200 ms (Posner & Petersen, 1990; Carrasco, 2011), whereas our cue-target SOA was 100 ms, and the target remained visible for only 50 ms. Under these temporal constraints, any voluntary, slow endogenous enhancement would primarily occur after the stimulus offset. Thus, although endogenous maintenance is theoretically possible given the cue’s validity, it is unlikely to have substantially contributed to perceptual encoding in our task.

      We also considered the possibility that our response cue (a retro-cue indicating the target location) might recruit endogenous attention to the internal perceptual representation. Importantly, however, this retro-cue was equally informative in valid and neutral conditions. Any enhancement driven by the retro-cue should therefore benefit both trial types to the same extent. The fact that we still observe a robust advantage in valid trials supports the conclusion that the performance improvements predominantly reflect fast, spatially specific exogenous facilitation rather than slower endogenous processes.

      We have revised the manuscript to clarify that although the cue obligatorily triggers an exogenous attentional shift, its 100% validity could allow for endogenous attention maintenance as shown by Chica et al. (2013). We also added an explanation detailing why such endogenous contributions are unlikely to drive our main results, given the rapid cue-target timing in our task in Discussion (lines 319-327). Finally, to further prevent ambiguity, we updated the manuscript title to refer to “exogenously triggered attention,” rather than simply “exogenous attention.”

      (2) Interpretation of statistical effects:

      The statement "Therefore, asymptotic performance showed only independent, additive effects of frequency and attention, without a systematic influence of spatial frequency on the attentional benefit" seems not to be supported by the data, as the main effect of frequency was not significant.

      We thank the reviewer for this helpful observation. We agree that the original phrasing did not accurately reflect the results, as the main effect of spatial frequency was not significant (p = .0545). We have revised the sentence to “Therefore, asymptotic performance reflected an effect of attention alone, with no detectable contribution of spatial frequency or of the interaction between spatial frequency and attention” to avoid implying such an effect (lines 210-211).

      If data from two participants were missing in one condition, the authors should consider replacing this data with new participants.

      We agree with the reviewer that having two observers with missing data in one condition is not ideal. However, the 20 cpd condition was deliberately positioned near the resolution limit at the tested eccentricity and was therefore extremely demanding. Observers also had to monitor two stimulus locations simultaneously, further increasing task difficulty. This condition was challenging for all observers and, despite testing up to the highest contrast, two of seven observers were unable to perform above chance, indicating that for a non-trivial fraction of observers, this condition was effectively unmeasurable with our paradigm. As noted in the manuscript, the 20 cpd condition also has a statistical limitation: thresholds clustered near the upper bound (approaching 100% contrast), compressing the dynamic range and markedly reducing variance relative to lower spatial frequencies, which violates the homoscedasticity assumption of linear models. For these reasons, we did not pursue additional data collection in this condition. Nevertheless, we report the data that were successfully obtained, as they remain informative about performance near the resolution limit.

      We finally note that even when setting aside the 20 CPD condition, our data support this conclusion: comparisons between 4 and 12 CPD, as well as between 8 and 12 CPD, revealed large differences in the magnitude of the attentional benefit (d = 0.65, 95% CI [0.11, 1.18] and d = 0.62, 95% CI [0.08, 1.14], respectively). To further quantify these effects, we have added Cohen’s d to report the effect sizes for these spatial-frequency comparisons across texts in Results as well as in tables in Appendices.

      (3) Sample size:

      As this is a psychophysical experiment with many trials and few participants, I am curious about how the authors determined the appropriate sample size and the number of trials required to detect the expected effects. Given that many effects were found to be significant, it seems that statistical power was adequate; however, it would be helpful if the authors could explain how this issue was addressed a priori during experimental planning.

      We appreciate that the reviewer raised this point. Please see the reply to the second point from Reviewer 1, who raised a related question about statistical power.

      (4) Figure 2 clarification:

      In Figure 2B, I do not fully understand the "Valid" and "Neutral" representation. Both conditions include a post-cue indicating the right position; however, in the neutral condition, there is a central fixation square, whereas in the valid condition, there is not. Please clarify this aspect of the figure. I think I understood the paradigm, but this part of the figure is misleading.

      Precue only exists in valid condition. But there is a mistake where fixation marker is missing in valid condition in panel B.

      We thank the reviewer for pointing this out. We have updated Figure 2 to explicitly show the sequence of valid vs. neutral trials. The fixation mark remained on the screen throughout the trial in both the valid and neutral conditions. After a 500 ms fixation period, an exogenous cue was presented for 30 ms in valid trials, followed by a 70 ms interval before stimulus onset. In neutral trials, no cue was presented, and the screen remained blank for 100 ms before the stimuli appeared. In conditions, a response cue would appear 50 ms after stimulus offset.

    1. eLife Assessment

      This is an important and rigorous study that addresses the question of what determines the spatial organization of endocytic zones at synapses. The authors use convincing approaches, in both Drosophila and rodent model systems, to define the role of activity and active zone structure on the organization of the peri-active zone. While the findings are primarily negative, they are carefully executed and contribute to the field by refining existing models of presynaptic organization.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Emperador-Melero et al. seek to determine whether recruitment of endocytic machinery to the periactive zone is activity-dependent or tethered to delivery of active zone machinery. They use genetic knockouts and pharmacological block in two model synapses - cultured mouse hippocampal neurons and Drosophila neuromuscular junctions - to determine how well endocytic machinery localizes after chronic inhibition or acute depolarization by super-resolution imaging. They find acute depolarization in both models have minimal to no effect on the localization of endocytic machinery at the periactive zone, suggesting that these proteins are constitutively maintained rather than upregulated in response to evoked activity. Interestingly, chronic inhibition slightly increases endocytic machinery levels, implying a potential homeostatic upregulation in preparation for rebound depolarization. Using genetic knockouts, the authors show that localization of endocytic machinery to periactive zones occurs independently of proper active zone assembly, even in the absence of upstream organizers like Liprin-α.

      Overall, they propose that the constitutive deployment of endocytic machinery reflects its critical role in facilitating rapid and reliable membrane internalization during synaptic functions beyond classical endocytosis, such as regulation of the exocytic fusion pore and dense-core vesicle fusion. Although many experiments reveal limited changes in the localization or abundance of endocytic machinery, the findings are thorough, and data substantially supports a model in which endocytic components are organized through a pathway distinct from that of the active zone. This work advances our understanding of synaptic dynamics by supporting a model in which endocytic machinery is constitutively recruited and regulated by distinct upstream organizers compared to active zone proteins. It also highlights the utility of super-resolution imaging across diverse synapse types to uncover functionally conserved elements of synaptic biology.

      Strengths:

      The study's technical strengths, particularly the use of super-resolution microscopy and rigorous image analyses developed by the group, bolster their findings.

      Weaknesses:

      One limitation, acknowledged by the authors, is the persistence of spontaneous activity at these synapses, which could still impact the organization of these regions.

      Comments on revisions:

      The authors have addressed all of my previous comments.

    3. Reviewer #2 (Public review):

      Summary:

      This study examines whether the localization of endocytic proteins to presynaptic periactive zones depends on synaptic activity or active zone scaffolds. Using genetic and pharmacological perturbations in both Drosophila and mouse neurons, the authors show that key endocytic proteins remain localized to periactive zones even when evoked release or active zone architecture is disrupted. While the findings are largely negative, the study is methodologically solid and provides useful constraints for current models of synaptic vesicle recycling.

      Strengths:

      The experimental design is careful and systematic, spanning both fly and mammalian systems. The use of advanced genetic models, including Liprin-α quadruple knockout mice, is a notable strength. High-resolution imaging approaches (STED, Airyscan) are appropriately applied to assess nanoscale organization. The study clarifies that strict activity dependence of endocytic recruitment may not be a general principle.

      Weaknesses (largely addressed in revision):

      Several initial concerns have been satisfactorily addressed in the revised manuscript. In particular, the inclusion of EndoA/Dap160 experiments and the expanded discussion improve the work. Some limitations remain, including the reliance on Tetanus toxin at the Drosophila NMJ, which does not fully abolish presynaptic fusion, and the still limited insight into the mechanistic basis of periactive zone organization. The biological interpretation of small changes in protein levels upon silencing also remains somewhat unclear.

      Comments on revisions:

      I thank the authors for the careful revision of the manuscript. The additional experiments, in particular the inclusion of EndoA and Dap160 at the Drosophila NMJ, as well as the extended discussion of limitations, are appreciated and address important points raised in the first round.

      While the principal conclusions of the study remain unchanged, and the manuscript is still largely based on negative results, I find that the authors now present these data in a more balanced and transparent manner. The discussion of activity-dependence is improved and more nuanced, especially with regard to possible contributions of spontaneous release and homeostatic effects.

      In my opinion, despite the mostly negative nature of the findings, the work provides a valuable and relevant contribution, as it defines important constraints on current models of periactive zone organization. The study is technically strong, carefully executed, and systematically performed across different model systems.

      Overall, the revised manuscript is clearly improved and represents a solid and well-executed piece of work that will be of interest to the field.

    4. Reviewer #3 (Public review):

      Summary:

      This study examines how synaptic endocytic zones are positioned using a combination of cultured neurons and the Drosophila neuromuscular junction. The authors test whether neuronal activity, active zone assembly, or liprin-α function is required to localize endocytic zone markers, including Dynamin, Amphiphysin, Nervous Wreck, PIPK1γ, and AP-180. None of the manipulations tested caused a coordinated disruption in the localization or abundance of these markers, leading to the conclusion that endocytic zones form independently of synaptic activity and active zone scaffolds.

      Strengths:

      The work is systematic and carefully executed, using multiple manipulations and two complementary model systems. The authors consistently examine multiple molecular markers, strengthening the interpretation that endocytic zone positioning is robust to changes in activity and structural assembly.

      Weaknesses:

      The main limitation is that the study does not test whether the methods used are sensitive enough to detect subtle functional disruption, and no condition tested produces clear disorganization of the endocytic zone. As a result, the conclusion that these zones assemble independently is supported by negative data, without a strong positive control for disassembly or mislocalization.

      This paper addresses a longstanding question in synaptic biology and provides a well-supported boundary on the types of mechanisms that are likely to govern endocytic zone localization. The conclusions are well justified by the data, though additional evidence would be needed to define the assembly mechanism itself.

      Comments on revisions:

      The authors responded to the initial review with care. They both revised the manuscript and conducted new experiments to address each reviewer's concern. The responses to the review were effective, and I think that the revised manuscript provides significant new insights. In my view, it does not require additional revisions.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful consideration of our work and constructive comments. We are glad that reviewers appreciated the rigor and value of our work. In response to the reviewer comments we have made the following changes:

      (1) Addition of new experiments on EndoA localization at the Drosophila NMJ (Fig. 2).

      (2) Addition of new experiments on Dap160 localization at the Drosophila NMJ (Fig. 2).

      (3) Addition of new experiments to validate Dynamin, Dap160 and EndoA antibodies (Fig. 2 – figure supplement 1).

      (4) Assessment of the activity-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 3).

      (5) Assessment of the liprin-dependence of EndoA and Dap160 localization at the Drosophila NMJ (Fig. 8).

      (6) Addition of a limitations section to the discussion to directly address that spontaneous release was not fully ablated in our studies and might contribute to recruitment.

      (7) Addition of an outlook to the same section on what experimental avenues could address the limitations in the future.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Emperador-Melero et al. seek to determine whether recruitment of endocytic machinery to the periactive zone is activity-dependent or tethered to delivery of active zone machinery. They use genetic knockouts and pharmacological block in two model synapses - cultured mouse hippocampal neurons and Drosophila neuromuscular junctions - to determine how well endocytic machinery localizes after chronic inhibition or acute depolarization by super-resolution imaging. They find that acute depolarization in both models has minimal to no effect on the localization of endocytic machinery at the periactive zone, suggesting that these proteins are constitutively maintained rather than upregulated in response to transient activity. Interestingly, chronic inhibition slightly increases endocytic machinery levels, implying a potential homeostatic upregulation in preparation for rebound depolarization. Using genetic knockouts, the authors show that localization of endocytic machinery to periactive zones occurs independently of proper active zone assembly, even in the absence of upstream organizers like Liprin-α. Overall, they propose that the constitutive deployment of endocytic machinery reflects its critical role in facilitating rapid and reliable membrane internalization during synaptic functions beyond classical endocytosis, such as regulation of the exocytic fusion pore and dense-core vesicle fusion. Although many experiments reveal limited changes in the localization or abundance of endocytic machinery, the findings are thorough, and data substantially support a model in which endocytic components are organized through a pathway distinct from that of the active zone. This work advances our understanding of synaptic dynamics by supporting a model in which endocytic machinery is constitutively recruited and regulated by distinct upstream organizers compared to active zone proteins. It also highlights the utility of super-resolution imaging across diverse synapse types to uncover functionally conserved elements of synaptic biology.

      We thank the reviewer for the positive assessment of our study.

      Strengths:

      The study's technical strengths, particularly the use of super-resolution microscopy and rigorous image analyses developed by the group, bolster their findings.

      We thank the reviewer for highlighting the technical strength of our work.

      Weaknesses:

      One notable limitation, however, is the absence of interrogation of endocytic proteins previously suggested to be recruited in an activity-dependent manner, in particular, endophilin.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drospophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin, which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al., 2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord versus Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together with our work, we conclude that these data suggest that Endophilin constitutively, but not completely, localizes to the periactive zone.

      Reviewer #2 (Public review):

      Summary:

      This study examines whether the localization of endocytic proteins to presynaptic periactive zones depends on synaptic activity or active zone scaffolds. Using a combination of genetic and pharmacological perturbations in Drosophila and mouse neurons, the authors show that proteins such as Dynamin, Amphiphysin, AP-180, and others are still recruited to periactive zones even when evoked release or active zone architecture is disrupted. While the results are mostly negative, the study is methodologically solid and contributes to a more nuanced understanding of synaptic vesicle recycling machinery.

      We thank the reviewer for deeming our work solid and for highlighting its importance for the field.

      Strengths:

      (1) The experimental design is careful and systematic, covering both fly and mammalian systems.

      (2) The use of advanced genetic models (e.g., Liprin-α quadruple knockout mice) is a notable strength.

      (3) High-resolution imaging (STED, Airyscan) is well used to assess spatial localization.

      (4) The findings clarify that certain core assumptions - such as strict activity dependence of endocytic recruitment - may not hold universally.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      (1) The study would benefit from a clearer positive control to demonstrate activity-dependent recruitment (e.g., Endophilin).

      We have added experiments to measure the localization of Endophilin, a protein previously reported to localize to the synaptic vesicle cloud [1], in Drosophila NMJs (Figs. 2 and 3). We observed that EndoA localized both to the synaptic vesicle cloud and to the periactive zone area. While stimulation did not enhance levels in either compartment, this outcome is not inconsistent with shuttling of protein between compartments during activity. Nevertheless, our data support a model in which EndoA, like the other tested endocytic proteins, is present at the periactive zone at rest.

      (2) The reliance on Tetanus toxin in the Drosophila NMJ experiments in my eyes is a limitation, as it does not block all presynaptic fusion events; this should be discussed more directly.

      We agree with the point of the reviewer. To more directly discuss it, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” (519-523).

      (3) The potential role of Dynamin in organizing other periactive zone proteins is not addressed and could be an important next step.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Some small changes in protein levels upon silencing are reported; their biological meaning (e.g., compensation vs. variability) is not fully clarified.

      These changes might include homeostatic adaptations. In the revised version of the manuscript, this is addressed on lines 135-137 and 405-407. We think it is overall difficult to assign biological meaning to small-magnitude changes, and chose to highlight the main point that there are no large-magnitude changes.

      (5) While alternative organizing mechanisms (actin, lipids, adhesion molecules) are mentioned, a more forward-looking discussion of how to test these models would be helpful.

      Following the reviewer’s suggestion, we have added an outlook section to the discussion where we provide suggestions for future studies (lines 510-543).

      (6) The authors should consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We have included new experiments on EndoA at the fly neuromuscular junction (Fig. 2, Fig. 3, Fig. 8, Fig. 3 – figure supplement 1) and have added appropriate discussion of these findings as outlined above.

      Reviewer #3 (Public review):

      Summary:

      This study examines how synaptic endocytic zones are positioned using a combination of cultured neurons and the Drosophila neuromuscular junction. The authors test whether neuronal activity, active zone assembly, or liprin-α function is required to localize endocytic zone markers, including Dynamin, Amphiphysin, Nervous Wreck, PIPK1γ, and AP-180. None of the manipulations tested caused a coordinated disruption in the localization or abundance of these markers, leading to the conclusion that endocytic zones form independently of synaptic activity and active zone scaffolds.

      We thank the reviewer for reviewing our work.

      Strengths:

      The work is systematic and carefully executed, using multiple manipulations and two complementary model systems. The authors consistently examine multiple molecular markers, strengthening the interpretation that endocytic zone positioning is robust to changes in activity and structural assembly.

      We thank the reviewer for pointing out these strengths.

      Weaknesses:

      The main limitation is that the study does not test whether the methods used are sensitive enough to detect subtle functional disruption, and no condition tested produces clear disorganization of the endocytic zone. As a result, the conclusion that these zones assemble independently is supported by negative data, without a strong positive control for disassembly or mislocalization.

      We are confident that our methods are sensitive enough to detect changes within synaptic compartments. First, for mouse neurons assessed with STED microscopy, we have demonstrated that we can distinguish between the N- and the C-termini of the presynaptic protein Bassoon, which are positioned only a few tens of nanometers apart [4]. We have subsequently been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart and have established that genetic manipulations of active zone proteins induce detectable disruptions as assessed by STED microscopy [4-12]. Given that the periactive zone is larger than the distances that we can resolve, we are confident that we can detect changes in this area with enough sensitivity. Second, for Drosophila NMJs, we use a carefully validated workflow that allows assessing the distribution of periactive zone proteins and can detect subtle changes [13]. Unfortunately, there are no known manipulations that lead to periactive zone disassembly that could serve as a positive control, which reflects the little knowledge available in this field. We acknowledge that there may be subtle changes in protein localization that escape the resolution of our microscopy methods or experimental design, but this would not undermine the conclusion that the periactive zone remains assembled across the manipulations that we have tested. Overall, none of the manipulations we test induces a detectable disruption of the periactive zone. Naturally, we cannot exclude milder effects and have added a limitations section to discuss this possibility and some of the subtle changes we observe.

      This paper addresses a longstanding question in synaptic biology and provides a well-supported boundary on the types of mechanisms that are likely to govern endocytic zone localization. The conclusions are well justified by the data, though additional evidence would be needed to define the assembly mechanism itself.

      We thank the reviewer for the support of the conclusion of our study.

      Recommendations for the authors:

      Reviewing Editor Comments:

      This is a rigorous study that, while presenting largely negative data, delimitates the processes that control peri-active zone organization. In addition to the interpretive and technical comments below, we encourage the authors to consider extending this study in two areas. First, examining the activity-dependence of Endophilin, and perhaps other factors, being recruited to the PAZ, where previous research has indicated a positive role for activity. Second, further characterization of the role of miniature release events in potentially contributing to PAZ organization. Overall, this was a rigorous and well-executed study.

      We thank the reviewing editor for this positive assessment of our work.

      Reviewer #1 (Recommendations for the authors):

      (1) The rationale for comparing chronic inhibition to acute depolarization could be more clearly articulated. While this approach may be grounded in prior studies, the physiological consequences of chronic silencing differ markedly from those of transient activity, and these distinctions should be more explicitly addressed in the interpretation of results. For example, might lower intensity, chronic stimulation be a better comparison? Since fixation takes place immediately after stimulation, the time window to capture changes in protein recruitment may be curtailed.

      We thank the reviewer for this comment. The introduction of the manuscript now includes a rationale on lines 110-112. By inhibiting evoked synaptic vesicle fusion throughout the lifespan of neurons, we assessed whether this process is necessary for periactive zone assembly and concluded that it is not a requirement. By acutely depolarizing neurons with 50 mM KCl or with a 40 Hz train of action potentials, we were able to test whether synaptic vesicle fusion triggers the rapid recruitment of endocytic proteins to the periactive zone and concluded that this is not the case for most of the endocytic proteins that we studied. While these results indicate that a constitutive pathway must exist to assemble the periactive zone, we remain agnostic as to whether stimulation paradigms not tested in our study can enhance the deployment of endocytic proteins, especially over long periods of time. This may be the case for low, chronic stimulation, as suggested by the reviewer. We clarify these limitations on a “limitations and outlook” section of the discussion (lines 510-543).

      (2) Amphiphysin stood out as the only protein showing a notable change in opposite directions under either active zone protein knockout/blockers and Liprin-α knockout. Given the predominance of negative results, it would be valuable to devote more discussion to why Amphiphysin behaves differently. What functional role might it play in this context that sets it apart from other endocytic components?

      As suggested by the reviewer, we have extended the discussion on Amphiphysin. One possibility why Amphiphysin may respond differently to different genetic manipulations or changes in stimulation is that different endocytic proteins might belong to different endocytic submachineries. This is addressed on lines 421-424. On lines 444-449, we further discuss the subtle decrease in the levels of Amphiphysin and AP-180 in Liprin-α mutants. We suggest that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus, and that this link may be partially disrupted in Liprin-α mutants. Overall, we note that Amphiphysin is still localized to the periactive zone at rest, and hence that it fits with the overall model of constitutive deployment that we propose.

      (3) The claim of activity-independence may need to be nuanced. Although the data suggest no recruitment in response to acute stimulation, the subtle changes following chronic inhibition complicate this interpretation, especially when considering redundancy. If activity-dependence is considered bidirectional, these findings might reflect a more complex regulatory mechanism. The interpretation in lines 188-190 more accurately captures this complexity than earlier generalizations.

      We agree with the reviewer that the dependence on activity should be discussed in a nuanced fashion. We have scrutinized the manuscript on this point and state throughout that recruitment is independent of evoked activity and not necessarily of any kind of activity. We believe that this interpretation is accurate because evoked release of neurotransmitter was ablated by the pharmacological and genetic manipulations that we used. Furthermore, we have included a “Limitations of the study” section in the discussion where we openly address that spontaneous fusion of synaptic vesicles cannot be ruled out as a potential mechanism to sustain periactive zone assembly (lines 514-523). Finally, we have expanded on the complexity of periactive zone assembly relative to activity. In particular, homeostasis may contribute to increased levels of endocytic proteins upon chronic blockade of evoked transmission (lines 404-406).

      (4) Given published work on endophilin's role in activity-dependent endocytic recruitment, adding endophilin (at least in the Drosophila NMJ experiments) would be highly informative.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for these findings compared to previous work on Endophilin [3], which we discuss on lines 407-410:

      “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are compatible with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (5) Line 57 might have a typo in the citation.

      We thank the reviewer for pointing this out. The citations now include: Bai et al., 2010; Jiang et al., 2024; Koh et al., 2007; Winther et al., 2013 and Winther et al. 2015. Please note that these two last citations are grouped as Winther et al. 2013, 2015 following our formatting style.

      (6) Line 208 might be missing a citation that justifies parameters.

      In the revision, this information is discussed on lines 222-224, where we cite our prior work describing these data: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023)”.

      Reviewer #2 (Recommendations for the authors):

      (1) Please consider including, or at least discussing, a well-established activity-dependent endocytic protein (e.g., Endophilin) as a positive control to help contextualize the negative findings.

      We thank the reviewer for the suggestion. We have added experiments to assess the localization of two more proteins at Drosophila NMJs. These proteins are EndoA and Dap160, both of which have been reported to traffic between the synaptic vesicle cloud and the plasma membrane in response to stimulation [1-3]. In line with these studies, we observed that EndoA and Dap160 partially co-localize with a synaptic vesicle marker and with a periactive zone marker, indicating localization to both compartments (Fig. 2). However, neither high frequency stimulation nor expression of TeNT changed the levels or the distribution of these two proteins at the periactive zone (Fig. 3). Similarly, the deployment of these proteins at the periactive zone at the Drosophila NMJ was not dependent on the active zone scaffold Liprin-α (Fig. 8). Our data indicate that deployment of EndoA and Dap160 to the periactive zone does not require evoked synaptic activity.

      We believe that there are multiple plausible explanations for our findings compared to previous work on Endophilin [3], which we discuss on lines 407-410: “Increased synaptic enrichment was also observed for Endophilin at nematode NMJs in mutants with disrupted exocytosis (Bai et al.,2010). We do not see such large shifts in Endophilin following similar manipulations, which might reflect distinct synaptic architectures in the C. elegans dorsal cord vs Drosophila NMJ terminals.” Further, this study finds that a plasma membrane-tethered Endophilin strongly colocalizes with endocytic machinery and largely rescues function. This suggests that the plasma membrane is the primary functional compartment for Endophilin. Together, all data are consistent with a model in which Endophilin constitutively, but not completely, localizes to the periactive zone.

      (2) Expand the discussion of TeNT's limitations-specifically that it does not block spontaneous fusion or alternative fusion pathways-and consider referencing more stringent tools (e.g., Botulinum toxins or SNARE mutants), even if they weren't used here.

      Following the reviewer’s suggestion, we have included a “Limitations and Outlook” section in the revised version. We state that “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited” (lines 514-515). We further state that, while the manipulations that we included result in decreased spontaneous release, “it is possible that the remaining spontaneous release supports periactive zone assembly” (518-519) and that “Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017)” (520-523).

      (3) We encourage the authors to briefly discuss whether Dynamin might contribute to periactive zone structure beyond its role in membrane fission. Loss-of-function data could be particularly informative in future work.

      We agree with the reviewer that this is an interesting possibility. On lines 454-455, we make the broad point that “interactions between endocytic proteins may further contribute to the anchoring of this apparatus”, and on lines 459-460, we specifically suggest a role for Dynamin by stating that “perturbing interactions between Dynamin-1 and Endophilin-A1 increases the distance between these proteins (Imoto et al., 2024), suggesting their binding has a scaffolding function.”

      (4) Clarify the interpretation of increased endocytic protein levels upon chronic silencing - are these interpreted as homeostatic responses or experimental variability?

      We suggest that these changes might include homeostatic adaptations. We note that this increase is of the same magnitude as the increase in active zone proteins following a similar pharmacological manipulation on lines 405-406, where we state that “a mechanism for this effect might be a homeostatic response (Wen and Turrigiano, 2024) similar in magnitude to the increase in active zone protein levels following activity blockade (Held et al., 2020).”

      (5) The Discussion could be strengthened by sketching out more concrete experimental approaches to test candidate mechanisms (e.g., roles for actin, lipids, adhesion molecules) in organizing periactive zones.

      The potential roles of the cell adhesion molecules (lines 430-440), cytoskeleton and lipids (442-452) are addressed in the discussion. Furthermore, following the reviewer’s suggestion, we have added the following statement (lines 541-543): “This work builds a foundation to assess alternative mechanisms and models of periactive zone assembly, including roles of the cytoskeleton, lipids, adhesion molecules, and intrinsic endocytic protein interactions”. We hope that the reviewer agrees that the discussion of our paper is not the right format to provide a concrete experimental plan for future work. In our view, the discussion should put the findings of our experiments in the context of the field.

      Reviewer #3 (Recommendations for the authors):

      (1) At a spine synapse, the endocytic zone is estimated to be between 100-200nm from the active zone. The focus of the author's analysis is largely outside of this region (0-150nm), raising the question of whether the area studied may be outside of the area affected by the manipulations made. While STED systems claim ~80 nm resolution, this is rarely achieved in practice, and the authors do not report the effective resolution of their system. Reporting the resolution achieved would address this issue. In addition, super-resolution imaging does not appear to have been used at the Drosophila NMJ. The authors should clarify whether resolution limitations influenced the choice of analysis region and whether their imaging approach is sufficient to detect changes in the endocytic zone.

      We believe that it is unlikely that the relevant signals were missed. First, in mouse synapses, most signal corresponding to endocytic proteins was detected inside the selected region of interest. Our rationale to select the area was based on the fact that expanding the region analyzed would have reduced the sensitivity of our approach, as averaging over a larger area would dilute the signal. The resolution of our microscopy should not be a limitation either. In our previous work, we demonstrated that STED microscopy allows discriminating between the N- and the C-terminal termini of the presynaptic scaffold Bassoon, which are positioned only a few tens of nanometers apart [4]. This establishes that we can resolve differences at tens of nanometers in biological context, which is more relevant than the resolution measured with fluorescent beads (which we have repeatedly assessed to be ~80 nm laterally). Subsequently, we have also been consistently able to resolve the localization of pre- and postsynaptic proteins that also localize a few tens of nanometers apart [4-12]. Given that the periactive zone spans over a larger area than the distances that we can resolve experimentally in the examples above, we are confident that our measurements are sensitive enough to detect changes in this area.

      Second, for Drosophila NMJs, the choice for the region of interest and the overall analysis was done following a workflow validated in our previous work [13]. This method analyzes both immediately adjacent and more distant regions from the active zone, and does not exclude any region based on distance from the active zone as described on lines 222-224: “Each unit is divided into ‘mesh’ and ‘core’ regions, where the periactive zone mesh is a ~175 nm wide area localized at ~330 nm from the center, and the ‘core’ region is the interior to this mesh (Del Signore et al., 2023).” In our previous study, we analyzed the distribution of periactive zone proteins at rest with STED microscopy and with Airyscan confocal microscopy. The resolution provided by Airyscan is reported to be ~175 nm in XY and ~400 nm in Z, which is sufficient to assess localization to the periactive zone compartment imaging methods and is not inferior to imaging methods previously used to report changes in the distribution of endocytic proteins; for examples, see [1,2]. In the revised manuscript, we have added new data measuring the levels and distribution of EndoA and Dap160 using STED microscopy (Figure 3 – figure supplement 1). The results acquired with STED microscopy and with Airyscan confocal microscopy are consistent with one another.

      Overall, the accuracy of the imaging methods and analyses used in this study are sufficient to assess periactive zone structure given its size and organization.

      (2) Interestingly, in a number of cases, the authors observe significant differences in endocytic markers (Figure 1q, 4k, 6k, 6r). However, little is made of these differences. The authors should provide more discussion of these changes and how they make sense of them alongside their claims of a lack of effect from their manipulations.

      The reviewer raises a good point. We interpret these changes in two different ways. First, we suggest that changes observed in response to block of action potentials or disassembly of the active zone might be homeostatic. This is addressed on lines 135-137. Second, we discuss that the actin cytoskeleton may be the link between the active zone and the endocytic apparatus. Several active zone proteins interact with the actin cytoskeleton. One of them is Liprin-α. This interaction may explain the decrease in the level of Amphiphysin and AP-180 at the periactive zone in Liprin-α null neurons. This is addressed on lines 444-449. We hope that the reviewer agrees that overall, we should focus on the main conclusion that deployment of endocytic proteins persists over a number of manipulations and synapse types.

      (3) The graphs in Figure 1c and 1g, 3g, 4c, 4e, 6c, and 6g do not appear to be identical. If the solid line represents the mean and the lighter color represents the distribution of these data, these data appear to be different from one another. It is surprising that these differences are not significant. What statistical tests were used to determine whether the differences in these graphs are not significant? Is the issue that a relatively now number of synapses were examined (30-60)? Did the authors conduct a power analysis?

      We apologize if the display of our data and analyses was not clear. We do not perform statistical analyses on the line profiles. Instead, we perform it on two values that are extracted from line profiles. These values are (1) the distance between the peak intensity values of the protein of interest and the marker and (2) the peak intensity values. For example, in Figure 1, distances are quantified and statistically analyzed in panel j, and the peak levels are quantified and statistically analyzed in panel k. We have clarified this in the legend of current Figures 1, 4, 5, and 7.

      (4) The authors clearly state that their experiments address the role of evoked activity in endocytic zone positioning, but they do not examine whether spontaneous vesicle fusion might play a role. Given the availability of Drosophila mutants that decrease (Doc2, Dunc-13) or increase (syt1) spontaneous release, this is a notable omission. Ideally, these mutants should be examined. And at a minimum, the authors should discuss whether spontaneous release could contribute to endocytic zone organization.

      We agree with the reviewer that spontaneous fusion of synaptic vesicles may contribute to periactive zone organization. Many of the genetic manipulations that we used in mouse neurons result in a significant decrease in spontaneous release. This includes Ca<sub>V</sub>2 triple knockouts with a ~60% decrease in spontaneous fusion [10], RIM+ELKS quadruple knockouts with a ~70% decrease in spontaneous fusion [9] and Liprin-α quadruple knockouts with a ~50% decrease in spontaneous fusion [7]. We cannot rule out that the spontaneous release that is left is sufficient to mediate assembly functions. The conclusive way to address this possibility is using a manipulation that ablates spontaneous release without altering other pathways. However, to our knowledge, this is not available. The manipulations suggested by the reviewer might suffer from similar limitations, as they would change the frequency of spontaneous release without fully ablating it, and they would also affect evoked release. We have included a limitations section in the discussion where we address this (lines 514-523), specifically stating “conclusions that can be drawn on the roles of spontaneous release in periactive zone assembly remain limited. While many of the manipulations used here, including Ca<sub>V</sub>2 knockout (Held et al., 2020), RIM+ELKS knockout (Tan et al., 2022; Wang et al., 2016) and Liprin-α knockout (Emperador-Melero et al., 2024) in hippocampal neurons, and TeNT expression in fly NMJs (Sweeney et al.,1995) , result in 50% to 70% decreased spontaneous release rates, it is possible that the remaining spontaneous release supports periactive zone assembly. Future studies might test manipulations with strong effects on miniature release including those affecting SNARE proteins and their regulators, with the caveat that these manipulations might have effects on upstream trafficking and in some cases on cell survival (Kaeser and Regehr, 2014; Santos et al., 2017).” We hope that the reviewer agrees that assessing these mutants should be a topic of future studies, given that we already test many mutants in the paper.

      (5) In Figures 1 and 6, the authors assess presynaptic protein localization in cultured neurons, but it is unclear whether these are synaptic sites. Many presynaptic proteins traffic together and can accumulate at sites lacking postsynaptic specializations. The authors should validate that the observed spatial organization occurs at bona fide synapses, ideally by co-labeling with postsynaptic markers as done in Figure 4. If methods like these were used, providing more details on how synapses were identified and selected would be useful to the reader.

      While we understand the reviewer’s point, we are confident that the structures analyzed are bona fide synapses for three reasons, as we have established before across many papers [4-8,10-12,17].

      The diameter of the structures detected using the synaptic vesicle marker Synaptophysin aligns much more closely with the size of the large vesicle clusters found at presynaptic terminals than with that of a few transport vesicles.

      In side-view synapses, the bar-like distribution of the active zone marker (Bassoon or Munc13-1) at one edge of the vesicle cloud indicates that active zone proteins are organized at one edge of the vesicle cluster—consistent with the architecture of synapses.

      Synaptophysin is one of our key markers for detecting synapses. In our cultures, most of the Synaptophysin signal colocalizes with postsynaptic markers (either PSD-95 or Gephyrin), as we have established across many studies [4,7-12]. This indicates that the markers used here are sufficient to select synapses. Furthermore, the frequency at which synapses were identified using an active zone marker as the second marker was similar to that observed when using a postsynaptic marker, suggesting that we were not randomly including unrelated structures.

      (6) Many of the images, particularly of the Drosophila NMJ, are of low quality and are shown in very small images. In addition, the quality of the images throughout the paper makes it difficult to assess the author's analysis and results. The authors should provide larger, higher-quality images that show examples of the means for each of the examples shown. This is an issue for most of the figures, but is particularly prominent in the dNMJ. A minor additional point is that the authors should be clear whether the dNMJ images are collected at super-resolution or using a conventional microscope.

      We believe that the quality of our images is sufficient for the assessments made for the following reasons:

      These images were acquired with enough spatial resolution to assess levels at the PAZ as discussed in response to this reviewer’s first comment. In our previous work, we used images acquired at the same resolution and presented in the same manner for both mouse hippocampal synapses [6,7] and Drosophila NMJs [13,18]. In those previous studies, we drew conclusions at a similar level of detail as in the current study.

      In our view, our representative images are not inferior in quality to other papers in the field addressing similar questions [1,2,19,20].

      We have selected sample images based on the quantified mean values per condition. Hence, we strived to select panels that are objectively representative regarding the quantified parameters.

      We have specified microscopy methods in the figure legends. Specifically, for Drosophila NMJs, we used Airyscan confocal microscopy and STED microscopy. For each experiment, it is now stated which microscopy method was used in the corresponding legend.

      References:

      (1) Winther, Å. M. E. et al. An Endocytic Scaffolding Protein together with Synapsin Regulates Synaptic Vesicle Clustering in the Drosophila Neuromuscular Junction. J Neurosci 35, 14756–14770 (2015).

      (2) Winther, Å. M. E. et al. The dynamin-binding domains of Dap160/intersectin affect bulk membrane retrieval in synapses. J Cell Sci 126, 1021–1031 (2013).

      (3) Bai, J., Hu, Z., Dittman, J. S., Pym, E. C. G. & Kaplan, J. M. Endophilin functions as a membrane-bending molecule and is delivered to endocytic zones by exocytosis. Cell 143, 430–441 (2010).

      (4) Wong, M. Y. et al. Liprin-alpha3 controls vesicle docking and exocytosis at the active zone of hippocampal synapses. Proc Natl Acad Sci U S A 115, 2234–2239 (2018).

      (5) Emperador-Melero, J., de Nola, G. & Kaeser, P. S. Intact synapse structure and function after combined knockout of PTPδ, PTPσ, and LAR. Elife 10, (2021).

      (6) Emperador-Melero, J. et al. PKC-phosphorylation of Liprin-α3 triggers phase separation and controls presynaptic active zone structure. Nat Commun 12, 3057 (2021).

      (7) Emperador-Melero, J. et al. Distinct active zone protein machineries mediate Ca2+ channel clustering and vesicle priming at hippocampal synapses. Nature Neuroscience 2024 1–15 (2024) doi:10.1038/s41593-024-01720-5.

      (8) Tan, C., Wang, S. S. H., de Nola, G. & Kaeser, P. S. Rebuilding essential active zone functions within a synapse. Neuron 110, 1498-1515.e8 (2022).

      (9) Wang, S. S. H. et al. Fusion Competent Synaptic Vesicles Persist upon Active Zone Disruption and Loss of Vesicle Docking. Neuron 91, 777–791 (2016).

      (10) Held, R. G. et al. Synapse and Active Zone Assembly in the Absence of Presynaptic Ca(2+) Channels and Ca(2+) Entry. Neuron 107, 667-683.e9 (2020).

      (11) Chin, M. & Kaeser, P. S. The intracellular C-terminus confers compartment-specific targeting of voltage-gated calcium channels. Cell Rep 43, 114428 (2024).

      (12) Nyitrai, H., Wang, S. S. H. & Kaeser, P. S. ELKS1 Captures Rab6-Marked Vesicular Cargo in Presynaptic Nerve Terminals. Cell Rep 31, 107712 (2020).

      (13) Del Signore, S. J., Mitzner, M. G., Silveira, A. M., Fai, T. G. & Rodal, A. A. An approach for quantitative mapping of synaptic periactive zone architecture and organization. Mol Biol Cell 34, (2023).

      (14) Sweeney, S. T., Broadie, K., Keane, J., Niemann, H. & O’Kane, C. J. Targeted expression of tetanus toxin light chain in Drosophila specifically eliminates synaptic transmission and causes behavioral defects. Neuron 14, 341–351 (1995).

      (15) Kaeser, P. S. & Regehr, W. G. Molecular mechanisms for synchronous, asynchronous, and spontaneous neurotransmitter release. Annu Rev Physiol 76, 333–363 (2014).

      (16) Santos, T. C., Wierda, K., Broeke, J. H., Toonen, R. F. & Verhage, M. Early Golgi Abnormalities and Neurodegeneration upon Loss of Presynaptic Proteins Munc18-1, Syntaxin-1, or SNAP-25. Journal of Neuroscience 37, 4525–4539 (2017).

      (17) de Jong, A. P. H. et al. RIM C2B Domains Target Presynaptic Active Zone Functions to PIP2-Containing Membranes. Neuron 98, 335-349.e7 (2018).

      (18) Del Signore, S. J. et al. An autoinhibitory clamp of actin assembly constrains and directs synaptic endocytosis. Elife 10, (2021).

      (19) Imoto, Y. et al. Dynamin 1xA interacts with Endophilin A1 via its spliced long C-terminus for ultrafast endocytosis. EMBO Journal https://doi.org/10.1038/S44318-024-00145-X

      (20) Imoto, Y. et al. Dynamin is primed at endocytic sites for ultrafast endocytosis. Neuron 110, 2815-2835.e13 (2022).

    1. eLife Assessment

      This is a valuable report describing tracheal terminal cells (TTCs) in Drosophila as an immune privileged organ. The authors demonstrated that TTCs lack expression of the membrane-associated peptidoglycan recognition receptor PGRP-LC, which protects these cells from immune pathway activation and JNK-mediated cell death to maintain TTC homeostasis. While the genetic experiments using RNAi and overexpression are convincing and solid, the broader biological significance of this phenomenon requires further investigation. This work will be of interest to researchers in innate immunity across various model systems.

    2. Reviewer #1 (Public review):

      Summary:

      In their manuscript entitled "Terminal tracheal cells of Drosophila are immune privileged to maintain their Foxo-dependent structural plasticity", Bossen and colleagues determine that the terminal cells of the tracheal system differ from other larval tracheal cells in that they do not typically show an Imd-dependent immune response to fungal and viral infections. Authors reach this conclusion based on the expression of a reporter line, Drs-GFP. The authors speculate that this difference may reflect differential expression of an immune pathway component, as tracheal terminal cells (ttcs) do not respond to forced expression of PRGP-LS. The authors then go on to show that, unlike the other cells of the tracheal system, terminal cells do not express PGRP-LC as reported by a GAL4 enhancer trap. Forced expression of PGRP-LC in terminal cells resulted in reduced branching, cell damage and features of the cell death program. These effects could be suppressed by depletion of AP-1 or Foxo transcription factors. Authors show that Foxo plays a negative role in branching of ttcs, with ectopic branching occurring upon RNAi (or under hypoxic conditions). The authors speculate that immune privilege of the ttcs may have evolved to permit Foxo regulation of ttc branching.

      Strengths:

      The authors provide compelling genetic data that support their overall conclusions.

      Weaknesses:

      FC do not appear to express DRS reporter in Figure 1 or elsewhere, raising the question of whether fusion cells are also immune privileged.<br /> Fig 5, TRE_RFP expression, is convincing in wt ttc, but not in ttc o/x PGRP-LCx

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Bossen et al. looked at the immune status of the tracheal terminal cells (TTCs) in Drosophila larvae. The authors propose that these cells do show PGFP-LCx expression and, hence, lack immune function. Artificial overexpression of the PGRP-LCx in the TTCs causes these cells to undergo apoptosis.

      Strengths:

      Only a few groups have tried to look at the immune status of the trachea, though we know that AMPs are expressed there after infection. This exciting study attempts to understand the differences in the tracheal cells that do not produce AMPs upon infection.

      Weaknesses:

      The reason why the TTCs have some immune privilege still needs to be completely clear. Whether the phenotype is cell autonomous or contributes to the cellular immune system is not evaluated. As we know, crystal cells also maintain oxygen levels in larvae; whether in the absence of a terminal trachea, the crystal cells have any role is not explored.

      My particular comments on the figures are as follows:

      (1) In Figure 2, the PGRP-LCx signal should be quantified as done for Drosomycin GFP, as shown in Figure 1.<br /> - The authors have now done this.

      (2) In Fig 2F and G are the larvae infected? If not, what happens to PGRP-LCx expression post Ecc15 infection?<br /> - The authors have answered this question, saying infection has no effect on TTCs' Dr-GFP expression.

      (3) Is the effect of overexpression of LCx exaggerated post-infection? In particular, when it comes to the escape phenotype.<br /> - This was not done; the infection experiment was done with PGRP-LE overexpression.

      (4) Does overexpression of anti-apoptotic genes in TTC and PGRP-LCx rescue the TTC branching?<br /> - This was not done.

      (5) Have the authors tried to rescue the larvae with shallow food?<br /> - This was not done.

      (6) Is there any effect on the circulating hemocytes or lymph gland in the PGFRP-LCx overexpressing animals?<br /> - This was not done.

    4. Reviewer #3 (Public review):

      Summary:

      The authors report that tracheal terminal cells (TTCs) in Drosophila do not activate innate immunity following bacterial infection, and attribute this to the absence of PGRP-LCx expression in these cells. Forced activation of the Imd pathway in TTCs leads to JNK-mediated cell death and reduced tracheal branching. The authors propose that this immune-privileged status preserves Foxo-dependent structural plasticity, which is essential for TTCs to respond to changing environmental conditions such as hypoxia.

      Strengths:

      The revised manuscript represents a meaningful improvement over the initial submission. The addition of multiple antimicrobial peptide reporters substantially strengthens the key observation that TTCs do not mount a humoral immune response upon infection, moving beyond reliance on the Drs-GFP reporter alone. The mechanistic dissection of the cell death pathway - demonstrating roles for JNK, AP-1, and Foxo downstream of ectopic PGRP-LCx activation - is well-executed and provides solid mechanistic insight. The inclusion of a second, independent UAS-PGRP-LCx line with a milder phenotype adds useful calibration. The hypoxia sensitivity assays provide physiological context, and the discussion of the gradient hypothesis, while based on qualitative observation, is logically reasoned and addresses a legitimate alternative interpretation.

      Weaknesses:

      The primary remaining concern is that the absence of PGRP-LCx expression in TTCs is supported by a single GAL4 enhancer trap line, without independent validation by complementary methods such as in situ hybridization, antibody staining, or reanalysis of publicly available single-cell transcriptomic data. The authors acknowledge this limitation transparently. While the convergent evidence from infection experiments - in which neither the Drs-GFP reporter nor the PGRP-LCx-Gal4 line shows TTC activation - lends indirect support, orthogonal confirmation would more definitively establish this mechanistic claim.

      Additionally, the finding that Dcp-1 cleavage occurs in non-TTC tracheal cells as well suggests that Imd-mediated apoptotic signaling is not uniquely restricted to TTCs, and the Discussion could more explicitly address what distinguishes the TTC response in terms of degree or cellular context.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In their manuscript entitled "Terminal tracheal cells of Drosophila are immune privileged to maintain their Foxo-dependent structural plasticity", Bossen and colleagues determine that the terminal cells of the tracheal system differ from other larval tracheal cells in that they do not typically show an Imd-dependent immune response to fungal and viral infections. The authors reach this conclusion based on the expression of a reporter line, Drs-GFP. The authors speculate that this difference may reflect differential expression of an immune pathway component, as tracheal terminal cells (TTCs) do not respond to forced expression of PRGP-LS. The authors then go on to show that, unlike the other cells of the tracheal system, terminal cells do not express PGRP-LC as reported by a GAL4 enhancer trap. Forced expression of PGRP-LC in terminal cells resulted in reduced branching, cell damage, and features of the cell death program. These effects could be suppressed by the depletion of AP-1 or Foxo transcription factors. The authors show that Foxo plays a negative role in the branching of TTCs, with ectopic branching occurring upon RNAi (or under hypoxic conditions). The authors speculate that the immune privilege of the TTCs may have evolved to permit Foxo regulation of TTC branching.

      Strengths:

      The authors provide compelling genetic data.

      Weaknesses:

      (1) The authors state that after infection 34% of larvae were not GFP+ as defined by the detection of Drs-GFP in dorsal branches. The authors should clarify if these larvae are completely without response to infection, with no Drs-GFP in dorsal trunks and or other tracheal branches. If these larvae are entirely unresponsive, could authors indicate why this might be? Also, at this point in the manuscript, the authors are somewhat misleading regarding TTC expression of Drs-GFP - they should state at this point that there are some TTCs that do express Drs-GFP, and also should address their prior study of Drs-GFP induction which does not claim exclusion of TTC Drs-GFP expression.

      GFP– indicates the absence of detectable fluorescence in regions proximal to the TTCs (dorsal branch and fusion cells). Our analysis specifically focused on these regions and did not assess fluorescence in other parts of the tracheal system. Therefore, the reported 34% of larvae classified as GFP– does not imply a complete absence of response in these animals; rather, no fluorescence was detected within our defined region of interest. To clarify how fluorescence in TTCs was quantified, we have added a schematic (new Fig. 1F). In addition, new Fig. S1 illustrates that AMP reporter activation frequently occurs in other tissues.

      Our observations are consistent with earlier reports. In the original description of the AMP reporter lines, Tzou et al. (2000; https://doi.org/10.1016/S1074-7613(00)00072-8) reported that “only a fraction of the flies or larvae exhibited fluorescence in surface epithelia, and the proportion of GFP-expressing animals was variable from one culture vial to the next. In addition, fluorescence was rarely distributed throughout the whole tissue and was limited to restricted areas of the epithelium,” suggesting that AMP reporter activation can occur locally rather than uniformly across tissues.

      In a previous study (https://doi.org/10.1186/1471-2164-9-446), we reported that airway epithelial cells, including the finest tracheal endings on target organs, can activate drosomycin transcription following infection. However, that study focused specifically on infected larvae. Importantly, it did not quantify the frequency of reporter activation or analyze TTC-specific phenotypes. As such, those statements should not be interpreted as implying uniform or ubiquitous reporter activation across all tracheal cells.

      (2) The authors describe the terminal cell phenotype as "shrunken" but this implies loss of size or pruning, however, it is not clear whether the defects could equally be due to lack of growth or slower growth.

      We omitted the term “shrunken” in the present manuscript to avoid potential misinterpretation.

      (3) Figure 1 suggests that GFP+ dorsal branches are not uniform in their expression of Drs-GFP, it seems more patchy. The authors should define the fraction of dorsal branch cells that are Drs-GFP positive. Also, are fusion cells Drs-GFP positive?

      We included a schematic illustrating our quantification approach (new Fig. 1F). We also revised the wording to clarify that GFP<sup>+</sup> animals include fluorescence not only in the dorsal branch (DB) but also in fusion cells (FCs), i.e., structures located between the dorsal trunks and the terminal tracheal cells (TTCs). Any structure in proximity to the TTCs that shows GFP expression was scored as GFP<sup>+</sup>. In most cases, GFP expression was observed in the dorsal fusion cells.

      (4) Drs-GFP expression is largely absent from terminal cells; however, a still significant # of terminal cells show expression (8%). Authors argue that PRGP-LC expression is absent based on a GAL4 transgenic line. If this line reflects endogenous PRGP-LC expression, should there not be 8% positive TTCs? Or is the 8% Drs-GFP expression independent of the IMD receptor?

      We detected PGRP-LE expression in approximately 3% of epithelial tracheal cells that expressed Drs after infection (Fig. 3F,G). This observation suggests that Drs activation can occur through a mechanism independent of PGRP-LCx. We have incorporated this finding into both the Results and Discussion sections.

      (5) Figure 2: the authors state that TTCs are negative even with induced PRGP-LE expression - should there not be at least 8% that are positive?

      We included infection of the PGRP-LE overexpression and could see Drs-GFP expression in 3 % of the cases, which we did not see without infection.

      (6) The authors compare PRGP-LC expression to induction of cell death by expression of reaper and hid. Reaper and Hid had stronger effects and eliminated TTCs. See cleavage of caspase Dpc-1 in PRGP-LC expressing cells. Is caspase cleavage always diagnostic of apoptosis or could the weaker than rpr/hid phenotype imply a different function?

      We have included the potential non-apoptotic functions of Dcp-1 in the Discussion. The weaker phenotype observed could therefore be explained by a non-apoptotic role of Dcp-1.

      (7) Drs-GFP expression is said to be "completely" absent from tracheal terminal cells when the entire tracheal system is expressing PGRP-LE.

      We have revised the wording accordingly.

      (8) Figure 5, TRE_RFP expression, is not convincing that it is higher or in terminal cells. https://doi.org/10.7554/eLife.102369.1.sa2

      We have revised the wording in line 230.

      Reviewer #2 (Public review):

      Summary:

      In this study, Bossen et al. looked at the immune status of the tracheal terminal cells (TTCs) in Drosophila larvae. The authors propose that these cells do show PGFP-LCx expression and, hence, lack immune function. Artificial overexpression of the PGRP-LCx in the TTCs causes these cells to undergo apoptosis.

      Strengths:

      Only a few groups have tried to look at the immune status of the trachea, though we know that AMPs are expressed there after infection. This exciting study attempts to understand the differences in the tracheal cells that do not produce AMPs upon infection.

      Weaknesses:

      The reason why the TTCs have some immune privilege still needs to be completely clear. Whether the phenotype is cell autonomous or contributes to the cellular immune system is not evaluated. As we know, crystal cells also maintain oxygen levels in larvae; whether in the absence of terminal trachea, the crystal cells have any role is not explored. https://doi.org/10.7554/eLife.102369.1.sa1

      In addition to the Drs-GFP reporter line, we performed new infection experiments using additional antimicrobial peptide reporters to further support our observations. While these experiments confirm the humoral immune response, they do not address the mechanisms underlying the apparent immune privilege. Our analysis therefore focuses specifically on the humoral immune response and does not allow conclusions regarding potential contributions of the cellular immune system, including crystal cells, to maintaining oxygen levels in animals with impaired TTCs. Notably, complete loss of TTCs is lethal, as demonstrated by TTC ablation using hid;rpr expression (Fig. 4F).

      Reviewer #3 (Public review):

      Summary:

      The authors report that tracheal terminal cells (TTCs) in Drosophila do not activate innate immunity following bacterial infection. They attribute this to the lack of expression of PGRP-LCx in these cells. Forced activation of the Imd pathway in TTCs leads to cell death and a reduction in tracheal branching. The authors propose a mechanism for cell death induction via pathways involving JNK, AP-1, and foxo. They suggest that the suppression of innate immunity in TTCs may serve to maintain their plasticity, preparing them for responses to hypoxic conditions.

      Strengths:

      (1) The study addresses the understudied area of immune privilege in innate immunity, providing a potentially important example in Drosophila TTCs.

      (2) The molecular characterization of the cell death pathway induced by forced Imd activation is well-executed and provides solid mechanistic insights.

      (3) The authors draw interesting parallels between Drosophila TTCs and mammalian endothelial cells, suggesting broader implications for their findings.

      Weaknesses:

      (1) The core premise of the study - that TTCs do not activate innate immunity following bacterial infection - relies heavily on a single readout (Drs reporter). Additional markers of immune activation would strengthen this crucial claim.

      We included new experiments using additional antimicrobial peptide reporter genes that show results similar to those obtained with the Drs-GFP reporter (new Fig. 1).

      (2) The evidence for the lack of PGRP-LCx expression in TTCs is based on a single GAL4 reporter line. Given the importance of this observation to the authors' model, validation using alternative methods would be beneficial.

      Although we were not able to include alternative methods to further confirm our hypothesis, we performed additional infection experiments. Upon bacterial infection, we observed a strong increase in GFP fluorescence throughout the animal and in many other tissues, while still detecting no response in the TTCs. These results further support our hypothesis.

      (3) The phenotypes observed upon forced activation of the Imd pathway in TTCs, while intriguing, may be influenced by non-physiological levels of pathway activation. The authors should address this potential caveat and consider examining the effects of more moderate pathway activation. https://doi.org/10.7554/eLife.102369.1.sa0

      We used two independent UAS-PGRP-LCx lines located on different chromosomes. One line (III) produced a stronger phenotype than the other (II). We clarified this point in the Results section (Fig. 4C,D) and added supplementary data (new Fig. S2) showing that both lines produce comparable phenotypes when expressed using an alternative tracheal driver. The epithelial thickening observed follows the same pattern as the phenotype detected in TTCs, indicating that even moderate pathway activation leads to similar effects. However, we acknowledge that this represents ectopic pathway activation and therefore likely reflects a non-physiological level of signaling.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      My particular comments on the figures are as follows:

      (1) In Figure 2, the PGRP-LCx signal should be quantified as done for Drosomycin GFP, as shown in Figure 1.

      We agree and have added a quantification.

      (2) In Figure 2F and G are the larvae infected? If not, what happens to PGRP-LCx expression post Ecc15 infection?

      We also included infected larvae to test whether infection induces GFP expression in TTCs. However, GFP expression was never observed in TTCs, although overall fluorescence increased in other tissues.

      (3) Is the effect of overexpression of LCx exaggerated post-infection? In particular when it comes to the escape phenotype.

      We induced mild Imd pathway activation by expressing PGRP-LE using a tracheal driver active in all tracheal cells, including TTCs, for 24 hours. In addition, these larvae were infected and their sensitivity to hypoxia was assessed. Animals expressing PGRP-LE in the trachea showed increased sensitivity to hypoxia, which was further enhanced following infection.

      (4) Does overexpression of anti-apoptotic genes in TTC and PGRP-LCx rescue the TTC branching?

      This point was not addressed.

      (5) Have the authors tried to rescue the larvae with shallow food?

      This point was not addressed.

      (6) Is there any effect on the circulating hemocytes or lymph glands in the PGFRP-LCx overexpressing animals?

      This point was not addressed.

      Reviewer #3 (Recommendations for the authors):

      The authors present an intriguing model of immune privilege in Drosophila tracheal terminal cells (TTCs). This model is built upon three key pillars: (1) the absence of innate immune activation in TTCs, (2) the lack of PGRP-LCx expression in TTCs, and (3) the induction of cell death when innate immunity is activated in TTCs. However, the experimental evidence supporting each of these critical points requires substantial strengthening. The reviewer recommends the following improvements and additional experiments to address these core issues:

      (1) Innate immune activation in TTCs:

      Evaluate the expression of additional antimicrobial peptide reporters to provide a more comprehensive assessment of innate immune activation in TTCs.

      In addition to the Drs-GFP reporter line, we performed new infection experiments using other antimicrobial peptide reporters to confirm our results.

      (2) PGRP-LCx expression in TTCs:

      Validate the PGRP-LCx-GAL4 line used in the study to ensure it accurately reflects endogenous PGRP-LCx expression.

      Employ complementary techniques such as in situ hybridization and antibody staining to corroborate the absence of PGRP-LCx in TTCs.

      We also included infection experiments using PGRP-LCx-Gal4 larvae. Infection did not trigger GFP expression in TTCs. However, the overall PGRP-LCx expression pattern observed in other larval tissues supports that the results reflect endogenous PGRP-LCx expression.

      (3) Cell death induction upon immune activation in TTCs:

      Address the possibility that the observed cell death is an artifact of strong, forced Imd pathway activation. To do that,

      perform control experiments activating the Imd pathway in non-TTC tracheal cells to determine if cell death is specific to TTCs.

      Use broader tracheal drivers (e.g., ppk4-GAL4 or btl-GAL4) to activate the Imd pathway and verify if cell death is indeed restricted to TTCs.

      We included results from PGRP-LCx overexpression using the tracheal driver ppk4-Gal4 and stained for the apoptosis marker Dcp-1 (new Fig. S3). We observed increased Dcp-1 signal in dorsal trunk cells, indicating that PGRP-LCx-mediated Dcp-1 cleavage is not restricted to TTCs.

      Ideally, generate a transgenic line expressing physiological levels of PGRP-LCx in TTCs and demonstrate that bacterial infection induces cell death specifically in TTCs through the proposed pathway. The reviewer acknowledges the complexity of this experiment but believe it would significantly strengthen the authors' conclusions.

      We did not generate a new transgenic line but instead used an alternative UAS-PGRP-LCx line (II), which exhibits a milder phenotype. This has now been clarified more prominently in the Results section (Fig. 4C,D). Additionally, we performed further experiments showing an epithelial thickening phenotype whose severity depends on the UAS-PGRP-LCx line used (new Fig. S2).

      In addition to the above major points

      (4) Quantitative data presentation:

      Provide quantitative analyses for the results presented in Figures 2 and 3J-K to allow for a more rigorous evaluation of the data.

      We included a quantitative analysis of the results shown in Fig. 2 (now presented in new Fig. 3). In addition, we added quantification of fluorescence in the TTCs of infected larvae.

      (5) Alternative hypothesis:

      Consider and address an alternative explanation for the lack of innate immune activation in TTCs: the potential gradient of bacterial ligands from proximal trachea to distal TTCs. If this hypothesis is correct, one might expect to see a gradient of Drs expression correlating with the distance from the proximal trachea. Addressing this possibility would strengthen the authors' proposed model.

      We now included the following paragraph as part of the discussion section.

      “An alternative explanation for the observed lack of an immune response in TTCs could be their maximal distance from the spiracles. In this scenario, a gradient of bacterial inducers along the tracheal system might be expected, resulting in a gradual decrease in immune activation from the spiracles toward the TTCs. However, this is not what we observed. In tracheae that displayed an immune response, the response was largely homogeneous along the entire length of the tracheal system, from the spiracles to the TTCs. Only at the transition to the TTCs did the immune response drop abruptly. This observation argues against the gradient hypothesis and suggests that TTCs are specifically excluded from the immune response.”

    1. eLife Assessment

      By screening an FDA-approved small-molecule library against a leucine-dependent M. tuberculosis strain, this study identifies semapimod as an inhibitor of Mtb growth that functions by impairing leucine import. The work is useful in linking leucine uptake to cell wall lipid biology in Mtb. However, the mechanistic understanding remains incomplete. Additional experimental evidence is required to clarify how PDIM contributes to or regulates leucine uptake.

    2. Reviewer #3 (Public review):

      Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium deplete of leucine.

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      A mechanistic gap still exists for the model of semapimod antitubercular activity. The basis for semapimod activity is that the leucine auxotroph strain cannot acquire leucine from its environment, and thus the bug ceases to grow. Under normal growth conditions, the leucine auxotroph strain produces PDIM and acquires exogenous leucine through some mechanism (either through a transporter or through PDIM). Semapimod binding to PpsB causes the cell to alter its PDIM profile (lacking experimental for this), and now with the altered PDIM profile the cell cannot acquire enough exogenous leucine to sustain growth (either because the altered PDIM profile interferes with the leucine transporter activity or through PDIM uptake). Acquiring a mutation in ppsB results in cells unable to produce PDIM (some evidence supporting this) but can now acquire enough exogenous leucine to sustain growth. I cannot find the connection between cells that have normal PDIM with normal leucine uptake and cells that are missing PDIM with normal leucine uptake.

      (1) The manuscript would benefit from adding additional antibiotic controls to experiments. With the current experimental approaches, it is unclear if these signatures are the result of semapimod specifically or the effect of an antimicrobial agent. Adding additional strains to the 2D TLC experiments could provide more confidence in the absence or modifications of the PDIM band.

      (2) The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or modified PDIM profiles, testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. Cells might recover growth in the presence of semapimod treatment if enough leucine is provided in the media and some fraction is able to enter the cell through the impaired PDIM barrier.

    3. Reviewer #4 (Public review):

      Summary:

      In this study, the authors screened an FDA-approved repurposed library of small-molecule inhibitors against the auxotrophic strain Mtb mc2 6206 and found that semapimod exclusively inhibited its growth. Further studies showed that it inhibits L-leucine uptake by interacting with PpsB, although the exact mechanism remains unknown. Interestingly, semapimod showed antibacterial activity against H37Rv only in vivo, not in vitro, suggesting a dependence on host-derived exogenous leucine during intracellular growth. This work therefore suggests that uptake of host-derived leucine can be targeted as an effective strategy to reduce intracellular survival of Mtb.

      Strengths:

      The authors have used different approaches to understand the mechanism of L-leucine uptake in Mtb. To start, they conducted an in vitro screen using an FDA-approved library, followed by transcriptomic and metabolic analyses of different Mtb mutants. Through whole-genome sequencing, they identified mutations conferring resistance to semapimod to gain further mechanistic understanding. This led to the analysis of semapimod-PpsB interaction by BLI-Octet and analysis of cell-wall apolar lipid, which explained how PDIM loss resulted in sensitivity to vancomycin. Finally, infection experiments in mice surprisingly showed that semapimod was effective against intracellular Mtb in vivo but not in vitro.

      Weakness:

      The major weakness of this study is that it is unclear what role PpsB plays in L-leucine uptake. It is also not clear why intracellular Mtb relies on exogenous leucine rather than endogenous leucine. Does intracellular Mtb lose its ability to synthesize leucine, which is why semapimod is active in vivo but not in vitro? Or semapimod has any other effect on host immunity that has not been explored. I have a few minor comments, which are as follows:

      (1) Authors state that "The colony forming unit (CFU) estimation further shows a bactericidal activity of this molecule which causes 88% reduction of bacterial viability on day 2 and >99% reduction after 5 days of incubation" (Fig. 1d). However, this is only true when compared to the untreated control. Compared to the Day 0 control, treated bacteria appear to have undergone little or no change, suggesting that the compound is bacteriostatic, not bactericidal. The drug concentration used for Fig 1d is not mentioned. For Fig. 1e, there is no day 0 control, and the comparison is with the untreated control at Day 6, which again does not suggest bactericidal action of Semapimod.

      (2) The authors report that "Notably, no cytotoxic effect was observed at this concentration against THP1, thus ruling out the possibility of cell lysis by semapimod," but the data are not shown. Similarly, authors state that "As a control, interaction of semapimod was also analyzed with the purified Ppe60, which fails to exhibit any binding," but the data is not shown.

      (3) Line 235: change "promote" to "promoter".

    4. Reviewer #5 (Public review):

      Summary:

      The authors have extensively characterized the response of the leucine and pantothenate auxotroph Mtb strain H37Rv mc26 206 to an FDA-approved compound library and identified semapimod that is, at best, bacteriostatic in its action against the pathogen. The authors have used transcriptional profiling, metabolite quantification and a screening of genetically-resistant mutants to identify changes in leucine uptake under semapimod exposure. Based on these data, the authors attribute changes in antibiotic susceptibility to differences in environmental leucine availability and bacterial PDIM architecture. While the work presents an interesting avenue of investigation of metabolite uptake and utilization in a comparative fashion between fully virulent and auxotroph Mtb strains, it lacks clear and direct evidence to link the observations with a mechanistic explanation.

      Strengths:

      The authors used a well-designed screening strategy for FDA-approved compounds against a metabolically defined strain and follow up characterization of semapimod exposure through RNA-seq and pathway analysis, metabolomics and time-course analysis of drug effects. The data has been interestingly interpreted to identify a phenotypic connection between PDIM and altered drug susceptibility.

      Weaknesses:

      The major gap in the study is the speculative nature of the mechanism underpinning the connection between PDIM architecture and changes in leucine uptake under various bacterial growth conditions.

      (1) Despite claims of identifying a "novel leucine uptake mechanism", the authors only provide endpoint metabolite measurements rather than kinetic leucine transport studies.

      (2) A clear explanation for the differences in susceptibility between auxotroph and fully virulent Mtb strains through changes in "PDIM architecture" is not supported by any direct evidence such as structural analysis, lipidomics, or direct measurement of PDIM architectural changes.

      (3) The figures 1D (lines 110-112, "kills bacteria") and 7c (lines 283-285) are used to infer a bactericidal role of semapimod, which maybe a mischaracterization of drug activity. The trend in CFUs in both cases seems of no bacterial growth rather than a CFU reduction- therefore interpreted as "bacteriostatic" at best. These observations would in fact align with the general antibiotic/stress response signature identified by RNA-seq, where leucine transport related genes only happen to be a small subset of many dysregulated genes. How do the authors disentangle these generic signatures from the leucine transport evidence, other than endpoint metabolite quantification?

      (4) Furthermore, the studies with supplementation of leuCD (and not panCD) in rescuing from semapimod susceptibility are not supported by a clear mechanistic link. The complementation of leuCD does not completely rescue growth- does this indicate differences in uptake and metabolism? The authors should test this by monitroing the growth of the strains in minimal medium in presence and absence of exogenous leucine.

      (5) It remains unclear if the authors attribute leucine uptake differences to a loss of PDIM or changes in PDIM amount and architecture. No direct evidence is provided for differences in PDIM production in the WT H37Rv strain and the auxotroph mc2 6206 strains used in this study. Mulholland et al (2024) report similar PDIM levels for WT and auxotrophic Mtb (mc2 6206) in their stocks passaged to maintain PDIM. This could change for stocks maintained differently. Since the presence of PDIM has classically been used to explain a penetration barrier for small molecules and the schematic provided by the authors at the end of the manuscript (figure 8c) suggest free leucine penetration in the absence of PDIM, how do the authors explain the increased leucine uptake and sensitivity of a PDIM positive auxotroph to semapimod through direct experimental evidence? Further on the point of PDIM production, the WT auxotroph strain seems to produce limited amounts of PDIM as evidenced by the TLC data in Figure 6b. To solidify this point, the authors should test other point mutants for PDIM production (not attenuated for growth) through TLC and quantify these differences. These data should be compared with PDIM production in the WT Mtb H37Rv strain (used by the authors) under in vitro growth conditions. A comparative lipidomics of cell envelope components might be insightful in explaining these differences. I believe answering this query is crucial and within the scope of the work whose central claim is the identification of a novel leucine uptake mechanism. It would be interesting, in fact, to identify a novel transporter associated with the PDIM layer on the cell envelope.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review): 

      Summary:

      In this manuscript, the authors used a leucine/pantothenate auxotrophic strain of Mtb to screen a library of FDA-approved compounds for their antimycobacterial activity and found significant antibacterial activity of the inhibitor semapimod. In addition to alterations in pathways, including amino acid and lipid metabolism and transcriptional machinery, the authors demonstrate that semapimod treatment targets leucine uptake in Mtb. The work presents an interesting connection between nutrient uptake and cell wall composition in mycobacteria.

      Strengths:

      (1a) The link between the leucine uptake pathway and PDIM is interesting but has not been characterized mechanistically. The authors discuss that PDIM presents a barrier to the uptake of nutrients and shows binding of the drug with PpsB. However it is unclear why only the leucine uptake pathway was affected.

      We observe interference of L-leucine, but not of pantothenate, uptake in mc2 6206 strain upon semapimod treatment. At present, we do not have any clue whether PDIM presents a barrier exclusively to the uptake of L-leucine. Further studies may shed a light on underlying mechanism(s) by which L-leucine uptake is modulated by this small molecule.

      (1b) We still do not know what PpsB actually does for amino acid uptake - is it a transporter?

      By BLI-Octet we do not find any interaction between L-leucine and PpsB. Therefore, we doubt that PpsB is a transporter of L-leucine.

      (1c) Does semapimod binding affect its activity?

      Our study suggests that semapimod treatment alters PDIM architecture which becomes restrictive to L-leucine. However, at present the exact mechanism is not clear. Further studies are required to thoroughly examine the effect of semapimod on Mtb PpsB activity and alterations in PDIM by mass spectrometry.

      (1d) Does the auxotrophic Mtb have lower PDIM levels compared to wild-type Mtb?

      As per the published report by Mulholland et al, and by vancomycin susceptibility phenotype in our study, both the strains appear to have comparable PDIM levels.

      (2) The authors show an interesting result where they observed antibacterial activity of semapimod against H37Rv only in vivo and not in vitro. Why do the authors think this is the basis of this observation? It is possible semapimod has an immunomodulatory effect on the host since leucine is an essential amino acid in mice. The authors could check pro-inflammatory cytokine levels in infected mouse lungs with and without drug treatment.

      Semapimod inhibits production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6, which would indeed help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth.

      (3) The authors show that the semapimod-resistant auxotroph lacks PDIM. The conclusions would be further strengthened by including validations using PDIM mutants, including del-ppsB Mtb and other genes of the PDIM locus, whether in vivo this mutant would be more susceptible (or resistant) to semapimod treatment.

      PDIM is a virulence factor, and plays an important role in the intracellular survival of the TB pathogen. Mtb strains lacking PDIM are expected to show attenuated growth during infection, even without semapimod treatment. In such a case, it might be difficult to draw any conclusions about the effect of semapimod against PDIM(-) strains in vivo.

      (4) Prolonged subculturing can introduce mutations in PDIM, which can be overcome by supplementing with propionate (Mullholland et al, Nat Microbiol, 2024). Did the authors also supplement their cultures with propionate? It would be interesting to see what mutations would result in Semr strains with propionate supplementation along with prolonged semapimod treatment. 

      Considering the fact that extensive subculturing may result in loss of PDIM, we avoided prolonged subculturing of bacteria. As presented in Fig. 6b, the WT bacteria retain PDIM. While performing the initial screening of drugs, we did not anticipate such phenotype, and hence bacteria were cultured in regular 7H9-OADS medium without propionate supplementation.

      A comprehensive future study would help examining the effect of propionate on generation of semapimod resistant mutants in Mtb mc2 6206.

      Weaknesses:

      I have summarized the limitations above in my comments. Overall, it would be helpful to provide more mechanistic details to study the connection between leucine uptake and PDIM.

      Reviewer #2 (Public review): 

      Summary

      This important study uncovers a novel mechanism for L-leucine uptake by M. tuberculosis and shows that targeting this pathway with 'Semapimod' interferes with bacterial metabolism and virulence. These results identify the leucine uptake pathway as a potential target to design new anti-tubercular therapy. 

      Strengths

      The authors took numerous approaches to prove that L-leucine uptake of M. tuberculosis is an important physiological phenomenon and may be effectively targeted by 'Semapimod'. This study utilizes a series of experiments using a broad set of tools to justify how the leucine uptake pathway of M. tuberculosis may be targeted to design new anti-tubercular therapy.

      Weaknesses

      (1) The study does not explain how L-leucine is taken up by M. tuberculosis, leaving the mechanism unclear. Even though 'Semapimod' binds to the PpsB protein, the relevant connection between changes in PDIM and amino acid transport remains incomplete.

      While Leucine uptake involves specific transporters in other bacteria, such transport system is not known in Mtb. By screening small molecule inhibitors, we came across a molecule, semapimod, which selectively kills the leucine auxotroph (mc2 6206), but not the WT Mtb. To understand the underlying mechanism of differential susceptibility of the WT and auxotrophic strains to this molecule, we evaluated the effect of restoration of leuCD and panCD expression on susceptibility of the auxotrophic strain to semapimod. Interestingly, our results demonstrated that upon endogenous expression of leuCD genes, mc2 6206 strain becomes resistant to killing by semapimod. In contrast, no effect of panCD expression was observed on semapimod susceptibility of mc2 6206. These findings were further substantiated by gene expression analysis of semapimod treated mc2 6206, which exhibits differential regulation of a set of genes that are altered upon leucine depletion in Mtb as well as in other bacteria. Overall results thus provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph.

      To further gain mechanistic insights into the effect of semapimod on leucine uptake in Mtb, we generated the semapimod resistant strain which exhibits point mutation in 4 genes including ppsB. Interestingly, overexpression of wild-type ppsB, but not of other genes, restored susceptibility of the resistant bacteria to semapimod. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As mentioned above, we anticipate that semapimod treatment brings about certain modifications in PDIM which becomes more restrictive to L-leucine. A comprehensive future study will be helpful to examine the effect of semapimod on Mtb physiology.

      (2) Also, the fact that the drug does not function on WT bacteria makes it a weak candidate to consider its usefulness for a therapeutic option.

      We agree that semapimod is not an appropriate drug candidate against TB owing to its inhibitory effect on production of proinflammatory cytokines such as TNF-α, IL-1β, and IL-6 that help pathogen establish chronic infection. However, a significant reduction in bacterial loads in lungs and spleen upon semapimod treatment despite inhibition of proinflammatory cytokines clearly indicates bacterial dependence on host-derived exogenous leucine during intracellular growth. Therefore targeting L-leucine uptake can be a novel therapeutic strategy against TB.

      Reviewer #3 (Public review): 

      (1) Agarwal et al identified the small molecule semapimod from a chemical screen of repurposed drugs with specific antimycobacterial activity against a leucine-dependent strain of M. tuberculosis. To better understand the mechanism of action of this repurposed anti-inflammatory drug, the authors used RNA-seq to reveal a leucine-deficient transcriptomic signature from semapimod challenge. The authors then measured a decreased intracellular concentration of leucine after semapimod challenge, suggesting that semapimod disrupts leucine uptake as the primary mechanism of action. Unexpectedly, however, resistant mutants raised against semapimod had a mutation in the polyketide synthase gene ppsB that resulted in loss of PDIM synthesis. The authors believe growth inhibition is a consequence of decreased accumulation of leucine as a result of an impaired cell wall and a disrupted, unknown leucine transporter. This study highlights the importance of branched-chain amino acids for M. tuberculosis survival, and the chemical genetic interactions between semapimod and ppsB indicate that ppsB is a conditionally essential gene in a medium depleted of leucine. 

      The conclusions regarding the leucine and PDIM phenotypes are moderately supported by experimental data. The authors do not provide experimental evidence to support a specific link between leucine uptake and impaired PDIM production. Additional work is needed to support these claims and strengthen this mechanism of action.

      As mentioned above, overall results from this study provide first evidence of perturbation of L-leucine uptake by semapimod treatment of the leucine auxotroph. Our observations that semapimod interacts with PpsB, and semapimod resistant strain accumulates mutation in PpsB resulting in loss of PDIM together support the involvement of cell-wall PDIM in regulation of L-leucine transport in Mtb.

      As hitherto mentioned, it appears that semapimod treatment brings about certain modifications in PDIM which becomes restrictive to L-leucine. Future studies are required to gain detailed mechanistic insights into the effect of semapimod on Mtb physiology.

      (2) Since leucine uptake and PDIM synthesis are important concepts of the manuscript, experiments would benefit from exploring other BCAAs to know if the phenotypes observed are specific to leucine, and adding additional strains to the 2D TLC experiments to provide confidence in the absence of the PDIM band.

      We thank the peer reviewer for this suggestion. We would be happy to analyse the effect of semapimod on the level of other amino acids including BCAA by mass spectrometry.

      (3) The intriguing observation that wild-type H37Rv is resistant to semapimod but the leucine-auxotroph is sensitive should be further explored. If the authors are correct and semapimod does inhibit leucine uptake through a specific transporter or disrupted cell wall (PDIM synthesis), testing semapimod activity against the leucine-auxotroph in various concentrations of BCAAs could highlight the importance of intracellular leucine. H37Rv is still able to synthesize endogenous leucine and is able to circumvent the effect of semapimod.

      We thank the peer reviewer for this suggestion. We would explore the possibility of analysing the effect of increasing concentrations of BCAAs on mc2 6206 susceptibility to semapimod.

      Recommendations for the authors:

      (1A) Intracellular leucine can decrease from:

      inhibition of transport/uptake via semapimod as the authors claim or

      decreased uptake/requirement of many metabolites due to cells entering static growth arrest from challenge by semapimod

      To rule out the growth-inhibitory effect of semapimod on L-leucine uptake, we estimated intracellular L-leucine in Mtb after brief exposure of 24 hours to 50ng/ml semapimod (kindly refer Materials and Methods). We confirmed that 24 hours of treatment with 50ng/ml semapimod does not cause cells entering static growth arrest.

      (1B) increased consumption/utilization of leucine for some programmed response to semapimod challenge

      Our results show reduced expression of genes involved in leucine catabolism such as accD1, bkdA and bkdB in semapimod-treated cells, and thus the above hypothesis seems unlikely.

      (1C) Additional metabolites should be measured to determine the specificity of the semapimod challenge.

      As mentioned below, we measured intracellular valine in the semapimod-treated Mtb 6206 by LC-MS/MS, which shows no change in its level. These observations thus corroborate a specific effect of semapimod on L-leucine level in the cell.

      (2) The effect of Semapimod on L-leucine uptake is largely based on indirect evidence, without showing reduced transport of the amino acid. Gene expression data is not enough to prove that the amino acid transport is blocked. More compelling evidence is required to confirm this mechanism.

      The authors could perform leucine uptake assays to directly confirm the functioning of Semapimod, inhibiting L-leucine transport. Another possibility would be to try out measuring intra-bacterial leucine levels for drug-treated versus untreated M. tuberculosis strains.

      Data presented in the Fig. 3b shows lesser intracellular L-leucine upon semapimod treatment; in contrast, Sem<sup>R</sup> strain exhibits ~3-fold more intracellular L-leucine, as estimated by mass spectrometry (kindly refer our response to comment #6 below). Together, these observations indicate an inhibitory effect of semapimod on L-leucine uptake by the auxotroph.

      (3) The authors show that the overexpression of leuC-leuD restores Semapimod resistance in the auxotroph (Figs. 3C-3E). Is it possible to examine Semapimod resistance of WT-H37Rv or the complemented mutant grown in leucine-limiting conditions? This sort of evidence will be more direct on the specific drug-target beyond the auxotroph (mc<sup>2</sup> 6206).

      Because endogenous L-leucine synthesis pathway is functional in WT-H37Rv, as well as complemented auxotrophic strain, leucine-limiting conditions are unexpected to yield any effect on susceptibility to semapimod.

      Author response image 1.

      (4) Biolayer Interferometry (BLI) shows Semapimod binds to PpsB (Fig. 6); however, there is no clear evidence that it disrupts PDIM synthesis. More direct evidence would be to study the effect of Semapimod on a ppsB mutant (may be a knock-down). This would prove the specificity of Semapimod for PpsB. Likewise, it would be worth looking into the effect of Semapimod using mutant M. tuberculosis defective for PDIM synthesis.

      As recommended by the peer reviewer, we created the ppsB knockdown strain in the Mtb mc2 6206 by CRISPRi and examined its vulnerability to semapimod treatment. As can be seen in the Author response image 1, ppsB KD strain shows lesser susceptibility to semapimod when compared with the pDcas9-control strain which exhibits significant growth inhibition on the 7H11-OADS-PL agar plate containing 200nM semapimod.

      (5) Metabolomics experiments would benefit from including other control BCAAs like isoleucine and valine to determine if decreased intracellular levels of leucine are specific to semapimod or a general consequence of growth arrest from an antimicrobial agent.

      As suggested by the reviewer, we measured intracellular valine as well as proline levels in the semapimod-treated Mtb 6206 by LC-MS/MS; data presented in the supplimentry figure 5 clearly show no change in their levels upon semapimod treatment.

      (5) Figure 3c, pyrazinamide susceptibility assay could be included on the panCD strain to ensure complementation leads to functional panCD. Parent strain would be resistant to PZA, complement strain would be susceptible. (doi: 10.1038/s41467-019-14238-3).

      The wild-type Mtb 6206 is unable to grow in the absence of pantothenate. We verified resumption of growth of Mtb 6206 in 7H9-OADS-L-leucine medium lacking pantothenate upon PanCD overexpression, which provides more direct evidence of the expression of functional copies of panCD genes.

      (6) does the Sem-R mutant have increased levels of leucine?

      As can be seen in the supplimentry figure 7, Sem<sup>R</sup> strain shows ~3.0 fold increase in the intracellular L-leucine level when compared with the WT strain. In contrast, a comparable level of another BCAA– valine, is observed in both the strains

    1. eLife Assessment

      This study presents valuable findings on the differential effects of RNA on the phase separation, aggregation dynamics, and bioactivity of PSMα3 and LL-37. The authors provide solid evidence from complementary biophysical and cell-based experiments that RNA influences peptide assembly and associated in vitro activities. The study is of interest for understanding interactions between amyloidogenic peptides and nucleic acids, although the physiological significance and some aspects of the mechanistic interpretation would benefit from further clarification.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to elucidate the role of RNA as a context-dependent modulator of liquid-liquid phase separation (LLPS), aggregation, and bioactivity of the amyloidogenic peptides PSMα3 and LL-37, motivated by their structural and functional similarities.

      Strengths:

      The authors combine extensive biophysical characterization with cell-based assays to investigate how RNA differentially regulates peptide aggregation states and associated cytotoxic and antimicrobial functions.

      Weaknesses:

      While the study addresses an interesting and timely question with potentially broad implications for host-pathogen interactions and amyloid biology, some aspects of the experimental design and data analysis require further clarification and strengthening.

    3. Reviewer #2 (Public review):

      In this paper, Rayan et al. report that RNA influences cytotoxic activity of the staphylococcal secreted peptide cytolysin PSMalpha3 versus human cells and E. coli by impacting its aggregation. The authors used sophisticated methods of structural analysis and describe the associated liquid-liquid phase separation. They also compare to the influence of RNA on aggregation and activity of LL-37, which shows differences to that on PSMalpha3.

      That RNA impacts PSM cytotoxicity when co-incubated in vitro becomes clear. However, I have two major problems with this study:

      (1) The premise, as stated in the introduction and elsewhere, that PSMalpha3 amyloids are biologically functional, is highly debatable and has never been conclusively substantiated. The property that matters most for the present study, cytotoxicity, is generally attributed to PSM monomers, not amyloids. The likely erroneous notion that PSM amyloids are the predominant cytotoxic form is derived from an earlier study by the authors that has described a specific amyloid structure of aggregated PSMalpha3. Other authors have later produced evidence that, quite unsurprisingly, indicated that aggregation into amyloids decreases, rather than increases, PSM cytotoxicity. Unfortunately, yet other groups have in the meantime published in-vitro studies on "functional amyloids" by PSMs without critically challenging the concept of PSM amyloid "functionality". Of note, the authors' own data in the present study that show strongly decreased cytotoxicity of PSMalpha3 after prolonged incubation are in agreement with monomer-associated cytotoxicity as they can be easily explained by the removal of biologically active monomers from the solution.

      In their revision and in the rebuttal, the authors have further described their concept regarding what they call "functionality" of PSMalpha3 amyloids. They now admit that monomers are the active cytolytic form, like other researchers have stressed, whereas amyloids are not. This represents a considerable difference to earlier papers in which they ascribed functionality, i.e. cytolytic capacity, to PSMalpha3 amyloids, a claim that has raised considerable controversy. Now, they use the term "functional " to describe that PSMalpha3 amyloids, while not cytolytic, can be reversed to a cytolytic monomeric state, calling them a "dynamic reservoir". There is no evidence that such a reservoir is necessary for the cytolytic activity of the monomers to be established; also, there is no evidence that in a biological system, such an amyloid reservoir exists. To continue calling PSMalpha3 amyloids "functional" based on this - considerably changed - concept of the authors appears inappropriate, given the finally admitted absence of cytolytic activity of the PSM amyloids in addition to the continuing complete lack of evidence of any biological relevance of PSM amyloid formation.

      (2) That RNA may interfere with PSM aggregation and influence activity is not very surprising, given that PSM attachment to nucleic acids - while not studied in as much detail as here - has been described. Importantly, it does not become clear whether this effect has biologically significant consequences beyond influencing, again not surprisingly, cytotoxicity in vitro. The authors do show in nice microscopic analyses that labeled PSMalpha3 attaches to nuclei when incubated with HeLa cells. However, given that the cells are killed rapidly by membrane perturbation by the applied PSM concentrations, it remains unclear and untested whether the attachment to nucleic acids in dying cells makes any contribution to PSM-induced cell death or has any other biological significance.

      Overall, the findings can be explained in a much more straightforward way with the common concept of cytotoxicity being due to monomeric PSMs, and the impact of nucleic acids on cytotoxicity being due to lowering of the concentration of that active form by RNA attachment. Further limiting the significance of the findings, whether this interaction has any biological significance on the physiology or infectivity of the PSM producer remains largely unexplored.

      Further remarks:

      • Circumstantial evidence based on the "amyloid inhibitor", EGCG: The results with EGCG, which has been shown to have a moderate amyloid-reducing effect on PSMalpha 1 and PSMalpha4, should not be taken as evidence for amyloid-based cytotoxicity. While increased concentrations of EGCG reduced the cytotoxic effect of PSMalpha3, it is not convincingly shown that this is due to a lower concentration of amyloid vs. monomeric PSM.

      • It is appreciated that the authors refrain from presenting the unsubstantiated concept of "functional" PSM amyloids in the discussion. However, wording in that direction must also be removed from other parts of the manuscript (e.g. "bioactive fibrillar polymorphs". "The formation of cross-alpha amyloids has been correlated with toxic activity", etc.), generally refraining from uncritically implying that amyloid formation underlies PSM biological activity, and rather discussing that the much more likely explanation of the findings is a lowering of cytolytically active, monomeric PSM concentration.

      • Discussion: "PSM alpha3 interaction with nucleic acids within human cells ...supports a comparable mechanism...". Delete. Unsubstantiated.

      • The authors should cite papers that have argued against their hypothesis and not only their own manuscripts.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to elucidate the role of RNA as a context-dependent modulator of liquid-liquid phase separation (LLPS), aggregation, and bioactivity of the amyloidogenic peptides PSMα3 and LL-37, motivated by their structural and functional similarities.

      Strengths:

      The authors combine extensive biophysical characterization with cell-based assays to investigate how RNA differentially regulates peptide aggregation states and associated cytotoxic and antimicrobial functions.

      Weaknesses:

      While the study addresses an interesting and timely question with potentially broad implications for host-pathogen interactions and amyloid biology, several aspects of the experimental design and data analysis require further clarification and strengthening.

      Major Comments:

      (1) In Figure 1A, the author showed "stronger binding affinity" based on shifts at lower peptide concentrations, but no quantitative binding parameters (e.g., apparent Kd, fraction bound, or densitometric analysis) are presented. This claim would be better supported by including: (i) A binding curve with quantification of free vs bound RNA band intensities (ii) Replicates and error estimates (mean {plus minus} SD).

      We thank the reviewer for this suggestion. To quantitatively support the binding differences observed in Figure 1A, we have now performed densitometric analysis of the EMSA data and included the results in Figure S1. The analysis showed that the Kd for PSMα3 binding to polyAU and polyA RNA is in the same order of magnitude but lower for the polyAU, indicating a stronger binding. A description was added to the results in lines 137-145 of the revised version.

      (2) The authors report droplet formation at low RNA (50 ng/µL) but protein aggregation at high RNA (400 ng/µL) through fluorescence microscopy. However, no intermediate RNA concentrations (e.g., 100-300 ng/µL) are tested or discussed, leaving a critical gap in understanding the full phase diagram and transition mechanisms.

      Our initial choice of 50 ng/µL (low RNA) and 400 ng/µL (high RNA) was guided by a broader RNA titration performed by turbidity measurements across 0, 10, 20, 50, 100, 200, and 400 ng/µL (Figure S2 in the revised version). In this screen, turbidity increased up to 50 ng/µL and then decreased dose-dependently from 100–400 ng/µL. We interpret this non-monotonic behavior as consistent with a transition from a droplet rich regime (maximal light scattering at intermediate dense-phase volume) toward conditions where assemblies become larger and/or more compact and sediment out of the optical path. This is described in lines 158-161 of the revised version.

      Of note, additional intermediate RNA conditions (100 and 200 ng/µL) are included in Figure S14 (of the revised version). While these experiments were performed under the heat-shock perturbation, they nevertheless support the central point that RNA tunes assembly state across intermediate concentrations rather than producing a binary low/high outcome.

      Importantly, we agree with the reviewer that a full phase diagram would be the most rigorous way to define the transition mechanism. However, establishing csat and constructing a complete phase diagram would require systematic measurements of dilute-phase concentrations (e.g., centrifugation/quantification or fluorescence calibration), controlled ionic strength titrations, and time-resolved mapping, which is beyond the scope of the present study. We have therefore revised the text to avoid implying that we provide a complete phase diagram. Instead, we frame our results as a qualitative with multi-assay characterization showing that RNA concentration drives a shift from liquid-like condensates (at low RNA) toward solid-like assemblies (at high RNA), with an intermediate regime suggested by the turbidity transition and supported by additional imaging under stress. Finally, to address the “critical gap” concern directly, we add a sentence (lines 239-241) stating that: “Future work will be required to quantitatively define the phase boundaries and delineate the dominant mechanisms, such as sedimentation, dissolution, or coarsening/aging, across intermediate RNA concentrations”.

      (3) Additionally, the behaviour of PSMα3 in the absence of RNA under LLPS conditions is not shown. Without protein-only data, it is difficult to assess if droplets are RNA-induced or if protein has a weak baseline LLPS that RNA tunes. The saturation concentration (csat) for PSMα3 phase separation, either in the absence or presence of RNA, should be reported.

      In response to the reviewer’s request, we have added Figure 2F, which shows PSMα3 alone in the absence of RNA under the same conditions. PSMα3 does not form droplets in this condition, indicating that condensate formation is RNA-dependent in the tested conditions. This is referred to in the text in lines 190-193 of the revised version. Please see our response about determining the csat in the response to the previous comment.

      (4) For a convincing LLPS claim, it is important to show: Quantitative FRAP curves (mobile fraction and half-time of recovery) rather than only microscopy images and qualitative statements.

      We have included quantitative FRAP analysis in Figure S4 of the revised version, showing normalized recovery curves along with extracted mobile fractions and half-times of recovery (t₁/₂). These quantitative measurements support the dynamic nature of the PSMα3–RNA. This is referred to in the text in lines 179-184 of the revised version.

      (5) The manuscript highly relies on fluorescence microscopy to show colocalization. However, the colocalization is presented in a qualitative manner only. The manuscript would benefit from the inclusion of quantitative metrics (e.g., Pearson's correlation coefficient, Manders' overlap coefficients, or intensity correlation analysis).

      In response, we have added quantitative colocalization analysis to the revised manuscript. Specifically, we now report Pearson’s correlation coefficients and Manders’ overlap coefficients for the dual-channel fluorescence microscopy datasets in Figure S5 of the revised version. These metrics provide an objective measure of co-distribution and complement the qualitative imaging.

      The analysis supports that at low RNA concentrations (droplet/condensate conditions), PSMα3 and RNA show strong colocalization, consistent with RNA being incorporated within, or closely associated with, the peptide-rich phase. In contrast, at high RNA concentrations, where the assemblies are more solid-like/amyloid-positive, the quantitative coefficients decrease, consistent with reduced overlap and an apparent spatial demixing in which RNA becomes partially excluded from the peptide-rich structures. This is referred to in the text in lines 194-203 of the revised version.

      (6) In Figures 3 B and 3C, the contrast between "no AT630 at 30 min, strong at 2 h" (50 ng/μL) and "strong at 30 min" (400 ng/μL) is compelling, but a simple quantification (e.g., mean fluorescence intensity per area) would greatly increase rigor.

      We have included quantitative analysis of AmyTracker630 fluorescence intensity in Figure S6 of the revised version, reporting the mean fluorescence intensity per area for the indicated conditions and time points. This quantification supports the qualitative differences observed in Figures 3B and 3C. This is now referred to in the text in lines 233-236 of the revised version.

      (7) In Figure S3 ssCD data, if possible, indicate whether the α-helical signal increases with RNA concentration or shows a non-linear dependence, which might link to the LLPS vs solid aggregate regimes.

      The ssCD spectra displayed in Figure S7 in the revised version (corresponding to Figure S3 in the original submission) show that the α-helical signature of PSMα3 is markedly enhanced in the presence of RNA compared to peptide alone, as evidenced by increased signal intensity, deeper minima, and more pronounced spectral features characteristic of α-helical structure. Importantly, this enhancement is more pronounced at 400 ng/µL Poly(AU) RNA than at 50 ng/µL, particularly after 2 hours of coincubation, indicating that RNA concentration influences the stabilization of α-helical assemblies. This is now more specifically detailed in the text in lines 258-263 of the revised version.

      We note that solid-state CD does not allow direct quantitative deconvolution of secondary structure content (e.g., % helix) in the same manner as solution CD, due to sample anisotropy, scattering, and orientation effects inherent to dried or aggregated films. Consequently, our interpretation is qualitative rather than strictly quantitative. The ssCD data therefore suggest a non-linear dependence on RNA concentration, rather than a simple linear dose–response. This is also expected considering that phase transition, suggested by the other findings, is intrinsically non-linear.

      (8) In Figure 5B, FRAP recovery in dying cells may reflect artifactual mobility rather than biological relevance. Additionally, the absence of quantification data limits interpretation; providing recovery curves would clarify relevance.”

      We added quantitative FRAP analysis of the effect on PSMα3 within HeLa cells, shown in Figure S8 of the revised version. Compared to PSMα3 assemblies in vitro, nucleolar PSMα3 exhibits slower fluorescence recovery and a reduced mobile fraction. The nucleolus represents a highly crowded, RNA-rich cellular environment, which is expected to impose additional constraints on molecular mobility and likely contributes to the slower recovery kinetics observed in cells. This is now more specifically detailed in the text in lines 324-333 and discussed in lines 597-607 of the revised version.

      (9) The narrative conflates cytotoxicity endpoints (membrane damage, PI staining, aggregates) with localization data (nucleolar foci), creating ambiguity about whether nucleolar targeting drives toxicity or is a consequence of cell death. Separating toxicity assessment from localization analysis, or clearly demonstrating that nucleolar accumulation precedes cytotoxicity, would resolve this ambiguity.

      We thank the reviewer for raising this important point. We agree that, in the current dataset, cytotoxicity readouts (membrane damage, PI staining, aggregate formation) and subcellular localization (nucleolar accumulation) are observed in close temporal proximity, which limits our ability to unambiguously assign causality. In the experiments presented here, PSMα3 was applied at concentrations known to induce rapid membrane disruption and cytotoxicity in HeLa cells. Under these conditions, PSMα3 accumulates on cellular membranes and penetrates into the cell and nucleus on very short timescales (seconds to minutes), likely preceding the temporal resolution accessible by standard live-cell fluorescence microscopy. As a result, nucleolar accumulation and cytotoxic endpoints are detected essentially concurrently, precluding a definitive determination of whether nucleolar association actively drives toxicity or occurs as a downstream consequence of membrane permeabilization and cell damage.

      We therefore emphasize that, in this study, nucleolar localization is presented as a phenomenological observation consistent with RNA-rich compartment association, rather than as a demonstrated causal mechanism of cytotoxicity. We have revised the Discussion (lines 597-607) to clarify this distinction and to avoid implying that nucleolar targeting is the primary driver of cell death.

      We agree that resolving this ambiguity would require systematic time-resolved and concentration-dependent experiments, including analysis at sub-toxic PSMα3 concentrations below the membrane-disruptive threshold, combined with orthogonal imaging approaches. Such experiments are planned for future work but are beyond the scope of the present study.

      (10) In Figure 8, to strengthen the LLPS assignment for LL-37, additional evidence, such as FRAP analysis or observation of droplet fusion events, would be valuable. This is particularly relevant given that the heat shock conditions (65 °C for 15 minutes) could potentially induce partial denaturation or nonspecific coacervation.

      In response to this comment, we have added FRAP analysis of LL-37 assemblies in the revised manuscript (Figure S12), including representative images and corresponding fluorescence recovery curves. The FRAP measurements show minimal fluorescence recovery over the acquisition window, indicating that the LL-37–RNA assemblies formed under these conditions are largely immobile and solid-like, rather than liquid-like droplets. This is now referred to in the text in lines 458-462 of the revised version.

      Reviewer #2 (Public review):

      In this paper, Rayan et al. report that RNA influences cytotoxic activity of the staphylococcal secreted peptide cytolysin PSMalpha3 versus human cells and E. coli by impacting its aggregation. The authors used sophisticated methods of structural analysis and described the associated liquid-liquid phase separation. They also compare the influence of RNA on the aggregation and activity of LL-37, which shows differences from that on PSMalpha3. 

      Strengths:

      That RNA impacts PSM cytotoxicity when co-incubated in vitro becomes clear. 

      Weaknesses:

      I have two major and fundamental problems with this study:

      (1) The premise, as stated in the introduction and elsewhere, that PSMalpha3 amyloids are biologically functional, is highly debatable and has never been conclusively substantiated. The property that matters most for the present study, cytotoxicity, is generally attributed to PSM monomers, not amyloids. The likely erroneous notion that PSM amyloids are the predominant cytotoxic form is derived from an earlier study by the authors that has described a specific amyloid structure of aggregated PSMalpha3. Other authors have later produced evidence that, quite unsurprisingly, indicated that aggregation into amyloids decreases, rather than increases, PSM cytotoxicity. Unfortunately, yet other groups have, in the meantime, published in-vitro studies on "functional amyloids" by PSMs without critically challenging the concept of PSM amyloid "functionality". Of note, the authors' own data in the present study, which show strongly decreased cytotoxicity of PSMalpha3 after prolonged incubation, are in agreement with monomer-associated cytotoxicity as they can be easily explained by the removal of biologically active monomers from the solution.

      We thank the reviewer for this important critique and agree that direct cytotoxicity is most plausibly mediated by soluble PSM species, while extensive fibrillation generally reduces toxicity by depleting these forms, a conclusion supported by our data and by other studies (e.g., Zheng et al 2018 and Yao et al 2019). We do not propose mature amyloid fibrils as the primary toxic entities. Rather, we use the term functional amyloid in a regulatory sense, consistent with other biological amyloids whose fibrillar states modulate activity (e.g., hormone storage amyloids or RNA-binding proteins).

      In line with emerging findings, we interpret PSMα3 toxicity as arising from a dynamic assembly process rather than from a single static molecular species. We previously showed that PSMα3 forms cross-α fibrils that are thermodynamically and mechanically less stable than cross-β amyloids and readily disassemble upon heat stress, fully restoring cytotoxic activity (Rayan et al., 2023). This behavior contrasts with PSMα1, which forms highly stable cross-β fibrils that do not recover activity after heat shock, suggesting that the limited thermostability of PSMα3 is an evolved feature enabling reversible switching between inactive (stored) and active states.

      Consistent with this view, both PSMα1 and PSMα3 are cytotoxic in their soluble states, yet mutants unable to fibrillate lose activity, indicating that fibrillation is required but not itself the toxic end state (Tayeb-Fligelman et al., 2017, 2020; Malishev et al., 2018). Our other studies further show that cytotoxicity toward human cells correlates with inherent or lipid-induced α-helical assemblies, rather than with inert β-sheet amyloids (RagonisBachar et al., 2022, 2026; Salinas 2020, Bücker 2022). Together, these findings support a model in which membrane-associated, dynamic α-helical assembly, which requires continuous exchange between soluble species and growing fibrils, drives membrane disruption, potentially through lipid recruitment or extraction, analogous to mechanisms proposed for human amyloids such as islet amyloid polypeptide (Sparr et al., 2004).

      In the present study, we further show that RNA reshapes this dynamic landscape: while PSMα3 alone progressively loses activity upon incubation, co-incubation with RNA preserves cytotoxicity by stabilizing bioactive polymorphs and condensate-like states, whereas high RNA concentrations promote solid aggregation but nevertheless preserve activity. Thus, aggregation is neither inherently functional nor toxic, but context dependent and environmentally regulated. Taken together, our data support a model in which PSMα3 amyloids act as a dynamic reservoir, enabling S. aureus to tune virulence by reversibly shifting between dormant and active states in response to environmental cues such as heat or RNA.

      This is now discussed in lines 56-76 and 523-553 of the revised version.

      (2) That RNA may interfere with PSM aggregation and influence activity is not very surprising, given that PSM attachment to nucleic acids - while not studied in as much detail as here - has been described. Importantly, it does not become clear whether this effect has biologically significant consequences beyond influencing, again not surprisingly, cytotoxicity in vitro. The authors do show in nice microscopic analyses that labeled PSMalpha3 attaches to nuclei when incubated with HeLa cells. However, given that the cells are killed rapidly by membrane perturbation by the applied PSM concentrations, it remains unclear and untested whether the attachment to nucleic acids in dying cells makes any contribution to PSM-induced cell death or has any other biological significance.

      We thank the reviewer for this important point and agree that PSM–nucleic acid interactions are not unexpected and that our data do not support a direct intracellular role for RNA binding in mediating cytotoxicity. Accordingly, we do not propose nucleolar or nuclear association of PSMα3 as a causal mechanism of cell death. At the concentrations used, PSMα3 induces rapid membrane disruption, and nucleic acid association is observed along with membrane attachment, precluding conclusions about intracellular function. This limitation is now explicitly clarified in the revised manuscript. The biological significance of our findings lies instead in extracellular and environmental contexts, where PSMα3 encounters abundant nucleic acids, such as RNA or DNA released from damaged host cells or present in biofilms as now addressed in lines 622631. Our data show that RNA modulates PSMα3 aggregation trajectories, shifting the balance between liquid-like condensates and solid aggregates, and thereby regulates the persistence and timing of cytotoxic activity. In this framework, RNA acts as a context dependent regulator of virulence, rather than as an intracellular cytotoxic cofactor, an aspect which would be studied in depth in future work. This is now addressed in the text in lines 597-607 of the revised version.

      Reviewer #3 (Public review):

      Summary:

      The manuscript by Rayan et al. aims to investigate the role of RNA in modulating both virulent amyloid and host-defense peptides, with the objective of understanding their self-assembly mechanisms, morphological features, and aggregation pathways. 

      Strengths:

      The overall content is well-structured with a logical flow of ideas that effectively conveys the research objectives.

      Weaknesses:

      (1) Figure 2 displays representative FRAP images demonstrating fluorescence recovery within seconds. To gain a more comprehensive understanding of how recovery after photobleaching varies under different conditions, it is recommended to supplement these images with corresponding quantitative fluorescence recovery curves for analysis.

      In response to this comment, we have supplemented the representative FRAP images with quantitative fluorescence recovery curves, reporting normalized recovery kinetics for the indicated conditions. These data are now provided in Figure S4 of the revised manuscript, allowing direct comparison of recovery behavior across conditions (shown by microscopy in Figure 2). In addition, we have included quantitative FRAP analyses for the cellular imaging shown in Figure 5 (presented in Figure S8) and for LL-37 assemblies formed under heat-shock conditions (Figure S12). Together, these additions provide a quantitative framework for interpreting the FRAP results and strengthen the distinction between liquid-like and solid-like assembly states.

      (2) Ostwald ripening typically leads to the shrinkage or even disappearance of smaller droplets, accompanied by the further growth of large droplets. However, the droplet size in Figure 2D decreases significantly after 2 h of incubation. This observation prompts the question, what is the driving force underlying RNA-regulated phase separation and phase transition?”

      We thank the reviewer for this observation. Across multiple samples, we consistently observe a coexistence of small droplets and larger aggregates, rather than systematic growth of larger droplets at the expense of smaller ones or a uniform decrease in droplet size. In addition, the timescales examined do not allow us to reliably assess whether diffusion-driven droplet coalescence is fast enough to draw firm conclusions about droplet size evolution. This is now addressed in the text in lines 181-184 of the revised version.

      A decrease in droplet size over time is nevertheless observed in some instances and is more consistent with a time-dependent conversion of initially liquid-like condensates into more solid-like assemblies, which would reduce molecular mobility and suppress droplet coalescence. In parallel, progressive fibril formation may act as a sink for soluble peptide, leading to partial dissolution or shrinkage of less mature condensates. Together, these observations are consistent with a non-equilibrium aging process, in which RNAregulated assemblies evolve from dynamic condensates toward more solid structures rather than following equilibrium Ostwald ripening.

      (3) The manuscript aims to study the role of RNA in modulating PSMα3 aggregation by using solution-state NMR to obtain residue-specific structural information. The current NMR data, as described in the method and figure captions, were recorded in the absence of RNA. Whether RNA binding induces conformational changes of PSMα3, and how these changes alter the NMR spectra? Also, the sequential NOE walk between neighboring residues can be annotated on the spectrum for clarity.

      The solution-state NMR experiments were performed specifically to characterize the potential binding of EGCG to PSMα3. Due to the strong tendency of PSMα3 to undergo rapid aggregation and line broadening upon RNA addition, solution state NMR spectra in the presence of RNA could not be obtained at sufficient quality for residue-specific analysis. As suggested, we have updated and annotated the sequential NOE walk between neighboring residues on the relevant NOESY spectra to improve clarity.

      (4) The authors claim that LL-37 shares functional, sequence, and structural similarities with PSMα3. However, no droplet formation was observed of LL-37 in the presence of RNA only. The authors then applied thermal stress to induce phase separation of LL-37. What are the main factors contributing to the different phase behaviors exhibited by LL37 and PSMα3? What are the differences in the conformation of amyloid aggregates and the kinetics of aggregation between the condensation-induced aggregation in the presence of RNA and the conventional nucleation-elongation process in the absence of RNA for these two proteins?

      We appreciate this important question and have clarified both the basis of the comparison and the origin of the divergent phase behaviors of LL-37 and PSMα3. While PSMα3 and LL-37 share key properties as short, cationic, amphipathic α-helical peptides that self-assemble and interact with nucleic acids, they differ fundamentally in their assembly architectures. PSMα3 is an amyloidogenic peptide that forms cross-α amyloid fibrils, in which α-helices stack perpendicular to the fibril axis. In contrast, LL-37 can form fibrillar or sheet-like assemblies (observed in cryo grids), but these lack canonical amyloid features without clear cross-α or cross-β amyloid order, as so far observed by crystal structures. This is now clarified in different parts of the text of the revised version. Thus, the comparison between the two peptides is functional and physicochemical rather than implying identical amyloid mechanisms. These structural differences likely underlie their distinct phase behaviors.

      Because LL-37 does not follow a classical amyloid nucleation–elongation pathway, and high-resolution structural information (e.g., cryo-EM) is currently lacking, partly due to its sheet-like, non-twisted morphology (unpublished results), it is not possible to directly compare aggregation kinetics or nucleation mechanisms between LL-37 and PSMα3. It is possible that amyloidogenic systems such as PSMα3 exhibit greater flexibility in prefibrillar and fibrillar polymorphism, enabling RNA-regulated phase behavior, whereas non amyloid assemblies such as LL-37 are more prone to stress-induced solid aggregation. We note that this interpretation is necessarily tentative and does not imply a general rule, but rather reflects differences evident in the present system. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Minor Comments:

      (1) In the abstract, replacing the word "overriding" with "counteracting" may provide a scientifically neutral tone.

      In the course of revision, the abstract was substantially rewritten to more precisely convey the mechanistic framework and key conclusions of the study. As part of this rewrite, the term "overriding" was removed and the language throughout was revised to adopt a more scientifically neutral tone, consistent with the reviewer's suggestion.

      (2) In abstract, the final sentence is ambitious but heavy. It may benefit from being split into two shorter sentences, for example:

      "These findings establish RNA as a potent, context-dependent modulator of both virulent amyloids and host-defense peptides. They further reveal phase transitions as tunable regulators of peptide activity and potential therapeutic targets across infectious and neurodegenerative diseases."

      As part of the broader abstract revision, the final sentence was restructured and the abstract as a whole was rewritten to improve clarity and readability, in the spirit of the reviewer's recommendation.

      (3) In the Introduction section,

      The phenol-soluble modulins (PSMs) produced by Staphylococci contain amyloid-forming short peptides which play multiple functional roles...", consider "Staphylococcal phenolsoluble modulins (PSMs) are short, amyloidogenic peptides that perform multiple roles central to pathogenesis....

      In accordance with the suggestion, the sentence has been revised.

      (4) To improve narrative flow in the final paragraph of the Introduction, a short bridging sentence could be added, such as:

      "Given these nucleic acid interactions, we next examined whether RNA can drive phase separation or structural reorganization of these amyloidogenic peptides."

      We thank the reviewer for this helpful suggestion. It provided an opportunity to clarify an important distinction between the two peptides studied. While LL-37 can self-assemble into higher-order α-helical structures, it is not amyloidogenic, in contrast to PSMα3. We therefore revised the bridging sentence in the final paragraph of the Introduction to read: “Given their shared cationic, amphipathic α-helical character, but distinct amyloidogenic properties, we sought to examine whether RNA differentially influences the assembly landscapes and bioactivity of PSMα3 and LL-37. “

      (5) The rationale for selecting Poly(A) and Poly(AU) would benefit from further clarification. It would be helpful to specify whether these RNAs are intended to model particular host or bacterial RNA species, such as AU-rich elements, rRNA-like sequences, or mRNA-like contexts.

      Poly(A) and Poly(AU) RNAs were selected as simplified, well-defined model RNAs to probe general peptide–RNA interactions in an unbiased manner, as no prior information was available regarding whether such interactions occur or which specific RNA species might be involved. This rationale is now clarified in the revised text (lines 128–131).

      These RNAs are not intended to represent a single biological transcript, but rather generic RNA features relevant to both host and bacterial contexts, including single-stranded homopolymeric regions and AU-rich elements commonly found in mRNAs and stress srelated RNAs. The use of such reductionist RNA models to study RNA–protein interactions, phase behavior, and RNA-modulated aggregation is well established. We nevertheless agree that RNA sequence and structure may influence peptide assembly and activity, and future studies will address sequence-specific and biologically derived RNAs.

      (6) In Figure 1A, essential EMSA controls- RNA alone, peptide alone, and a nonspecific peptide or PSMα3 should be included to distinguish specific complexes from artifacts, even if presented in the supplementary information. In addition, a competition assay using unlabeled RNA would help confirm binding specificity and rule out predominantly nonspecific electrostatic interactions; these data could also be reported in the supplementary figures.

      An RNA-alone control is already included in Figure 1A of the revised version. The first lane (“0 µM”) shows free Poly(A) or Poly(AU) RNA in the absence of peptide and serves as the negative control against which PSMα3-induced mobility shifts are evaluated. A peptide-alone EMSA cannot be performed, as PSMα3 is highly cationic and does not migrate into the gel in the absence of RNA; moreover, EMSA in this format reports on RNA mobility rather than peptide migration.

      With respect to binding specificity, we compared Poly(A) and Poly(AU) RNAs and observed distinct binding behaviors, which would not be expected for purely nonspecific electrostatic interactions. In addition, the extracted Hill coefficients (>1) are consistent with cooperative binding, further arguing against simple charge-driven association. Finally, the RNA-dependent association of PSMα3 is independently supported by fluorescence microscopy and quantitative colocalization analyses, which corroborate the EMSA results. Together, these orthogonal approaches support the relevance of the observed peptide–RNA interactions.

      (7) In Figure 1B, there is a time mismatch between EMSA (30 minutes) and TEM (2 hours). If aggregation progresses over time, the EMSA pattern at 2 hours may differ. This point could be acknowledged or experimentally addressed, as RNA-peptide assemblies may evolve from liquid-like condensates to more solid aggregates.

      The EMSA and TEM experiments were intentionally performed at different time points to capture distinct stages of the PSMα3–RNA assembly process. The EMSA assay (30 minutes) was designed to probe early RNA–peptide complex formation and binding interactions, before extensive higher-order aggregation occurs. At this stage, we aim to detect mobility shifts reflecting complex formation rather than mature assemblies. In contrast, TEM was performed after 2 hours to visualize later-stage structural outcomes, including fibrillation and morphological reorganization. As aggregation progresses over time, the assemblies evolve from early RNA–peptide complexes into more ordered fibrillar structures, which are best assessed by electron microscopy at later time points. To improve clarity and avoid potential confusion, we have streamlined Figure 1 to focus on the EMSA data, which specifically addresses early binding events. The TEM data were removed from Figure 1 and are now presented in Figure 3, where later-stage structural transitions and fibrillation are shown more comprehensively and in the appropriate mechanistic context.

      (8) In Figure 1B, if feasible, complementing TEM with a confirmatory fibril assay (e.g., ThT kinetics) under the same conditions would strengthen the conclusion that the morphology difference is robust, but it is not mandatory.

      We attempted to perform ThT fibrillation kinetics under the same RNA containing conditions; however, these assays were not informative for this system. PSMα3 aggregates extremely rapidly, producing an immediate and steep increase in ThT fluorescence (Fig. S9 in the revised version), which prevents reliable resolution of RNA dependent differences in aggregation kinetics or lag phases. In addition, Poly(AU) RNA interferes with ThT readout through electrostatic interactions between the negatively charged RNA and the cationic dye, as well as through RNA-induced changes in fibril morphology, both of which complicate quantitative interpretation of fluorescence kinetics. Based on these technical constraints and prior experience with RNA–amyloid systems, ThT kinetics under identical RNA conditions would not provide a robust or interpretable confirmation of the morphological differences observed by TEM.

      (9) In Figure 1B, PSMα3 alone control is missing in TEM images.

      A TEM image of PSMα3 alone is included in Figure 3, where we systematically present fibrillation outcomes across different RNA concentrations alongside the peptide-only control. Figure 1 was streamlined to focus on early RNA– peptide interactions assessed by EMSA, whereas Figure 3 provides a comprehensive TEM analysis of later-stage structural outcomes. This organization was chosen to clearly separate early binding events from subsequent assembly transitions and to avoid redundant presentation of TEM images under similar conditions.

      (10) Although it is experimentally practical to focus on Poly(AU), the justification is very one-sided. The Poly(A) condition, which yields amorphous aggregates, may be equally informative for understanding toxicity, LLPS, or nonfibrillar states and could be discussed more explicitly.

      We agree that Poly(A)-induced amorphous aggregation is informative for understanding non fibrillar assembly states. However, the primary aim of this study was to dissect RNA-dependent regulation of fibrillar assembly and phase behavior, which is most clearly captured using Poly(AU). Poly(A) was therefore included as a comparative condition rather than as a focus for detailed mechanistic analysis. A more systematic comparison of different RNA classes and their effects on non fibrillar states and toxicity is an important direction for future work but is beyond the scope of the present study.

      (11) To improve readability of the manuscript, the main text should follow the order of the figure panels (e.g., A, B, C, D, and E) and numbers (Figure 1, 2...) sequentially, so that readers can easily align with the corresponding images.

      We have revised the manuscript to improve alignment between the main text and the figures, adjusting panel ordering and numbering where appropriate so that the text now follows the figure panels and figure numbers more sequentially. These changes were made to enhance readability while maintaining a logical visual flow within the figures.

      (12) In the result section of Figure 2, the analogy to Ddx4-like systems is a helpful concept, but should be clearly framed as an analogy, not evidence. It would be more accurate to say that the behavior is "conceptually similar to" those systems, while noting that the molecular context is significantly different.

      We have revised the text to explicitly frame the comparison to Ddx4-like systems as a conceptual analogy rather than evidence: lines 158-161 in the revised version.

      (13) In Figure 4, inclusion of positive and negative controls to validate assay performance (e.g., untreated bacteria or HeLa cells, lysis buffer, media alone) would strengthen confidence in the bioactivity measurements.

      We wish to clarify that appropriate positive and negative controls were included in all bioactivity assays and were used to normalize the data presented in Figure 4. For the HeLa cytotoxicity assay (LDH), untreated cells were used to determine spontaneous LDH release (negative control), and cells treated with the manufacturer supplied lysis buffer were used to determine maximum LDH release (positive control). The percent cytotoxicity shown in Figure 4B was calculated relative to these internal controls, as described in the Methods. For the antibacterial assay (PrestoBlue), wells containing E. coli without peptide served as the positive control for 100% viability, while wells containing sterile LB medium alone were used as blanks. Viability values in Figure 4A were normalized to these controls. We have ensured that the Methods section explicitly describes these controls to reinforce confidence in the bioactivity measurements.

      (14) To enhance clarity, consider presenting the RNA concentration and time-dependent effects on PSMα3 bioactivity in a comparison table within the main text or as a supplementary figure.

      We appreciate this suggestion and carefully considered presenting the data in tabular form. However, we found that graphical representation more effectively conveys the trends, transitions, and comparative patterns between conditions. A table would not adequately capture these relationships.

      Reviewer #2 (Recommendations for the authors):

      Further remarks:

      (1) Circumstantial evidence based on the "amyloid inhibitor", EGCG: The results with EGCG, which has been shown to have a moderate amyloid-reducing effect on PSMalpha 1 and PSMalpha4, should not be taken as evidence for amyloid-based cytotoxicity. While increased concentrations of EGCG reduced the cytotoxic effect of PSMalpha3, it is not convincingly shown that this is due to a lower concentration of amyloid vs. monomeric PSM.

      We agree that the effects of EGCG should not be interpreted as evidence for amyloid fibrils being the cytotoxic species. Our data instead support a mechanism in which EGCG primarily targets soluble PSMα3, thereby redirecting its assembly pathway and depleting bioactive species. Specifically, solution-state NMR (Fig. 7) shows that EGCG binds defined residues of monomeric PSMα3, consistent with sequestration of soluble peptide rather than selective inhibition of fibrils. Complementary light and electron microscopy, together with kinetic measurements, indicate that EGCG does not simply stabilize monomers but instead diverts PSMα3 into amorphous, non-functional aggregates, as visualized by TEM (Fig. 6B) and reflected in altered ThT responses (Fig. S9). Importantly, these EGCG-induced aggregates are non-cytotoxic (Fig. 6A/C) and fail to associate with membranes or cells, in contrast to untreated PSMα3, which forms membrane-associated assemblies and induces disruption (newly added Movies S1-S2). Thus, EGCG potentially reduces cytotoxicity by remodeling the aggregation landscape and depleting active soluble species, rather than by selectively inhibiting specific fibril formation. This clarification is now added to the Discussion in lines 554-564 of the revised version.

      (2) It is appreciated that the authors refrain from presenting the unsubstantiated concept of "functional" PSM amyloids in the discussion. However, wording in that direction must also be removed from other parts of the manuscript (e.g. "bioactive fibrillar polymorphs". "The formation of cross-alpha amyloids has been correlated with toxic activity", etc.), generally refraining from uncritically implying that amyloid formation underlies PSM biological activity, and rather discussing that the much more likely explanation of the findings is a lowering of cytolytically active, monomeric PSM concentration.

      As detailed in our response to Major Comment #1, we agree that uncritical language implying that amyloid fibrils themselves are the cytotoxic species should be avoided. Accordingly, we have revised the manuscript to consistently frame amyloid formation in regulatory terms. Aggregation, depending on context, modulates activity by altering the availability, persistence, and assembly pathways of these species. Distinct aggregation states are therefore presented as correlated with, but not equivalent to, cytotoxic activity, and as components of a dynamic assembly landscape rather than as direct toxic entities.

      (3) Discussion: "PSM alpha3 interaction with nucleic acids within human cells ...supports a comparable mechanism...". Please delete this as it is unsubstantiated.

      We agree that the original phrasing overstated the evidence. The sentence was removed and the Discussion was revised to clearly frame nucleolar accumulation as a phenomenological observation reflecting PSMα3's intrinsic nucleic acid–binding capacity, rather than as evidence for a comparable intracellular mechanism. Specifically, the revised Discussion (lines 597–607) states that nucleolar localization is "unlikely to represent a distinct intracellular toxic mechanism" and instead "reflects binding competence within RNA-rich compartments following cellular entry." The biological relevance of this interaction, particularly at sub-cytotoxic concentrations, is noted as an open question requiring further investigation.

      (4) The authors should also cite papers that have argued against their central hypothesis of "functional" PSM amyloids.

      We thank the reviewer for this suggestion. Accordingly, we have revised the manuscript to explicitly cite and discuss studies that argue against amyloid fibrils as the primary cytotoxic species, and that instead attribute PSM cytotoxicity to soluble or membrane-associated forms. These perspectives are now incorporated in the Discussion to provide a balanced view of the field and to clarify how our findings align with, and differ from, existing models of PSM activity.

    1. eLife Assessment

      This important work advances our understanding of the development of the visual system. The data presented is compelling and provides a detailed single-cell atlas of post-natal anterior chamber development in mice, highlighting the trabecular meshwork and Schlemm's canal.

    2. Reviewer #2 (Public review):

      Summary:

      This study presents a detailed single-cell transcriptomic analysis of the post-natal development of mouse anterior chamber tissues. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adult. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).

      Comments on revisions:

      My critiques have been adequately addressed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a comprehensive single-cell atlas of mouse anterior segment development, focusing on the trabecular meshwork and Schlemm's canal. The authors profiled ~130,000 cells across seven postnatal stages, providing detailed and solid characterization of cell types, developmental trajectories, and molecular programs.

      Strengths:

      The manuscript is well-written, with a clear structure and thorough introduction of previous literature, providing a strong context for the study. The characterization of cell types is detailed and robust, supported by both established and novel marker genes as well as experimental validation. The developmental model proposed is intriguing and well supported by the evidence. The study will serve as a valuable reference for researchers investigating anterior segment developmental mechanisms. Additionally, the discussion effectively situates the findings within the broader field, emphasizing their significance and potential impact for developmental biologists studying the visual system.

      Weaknesses:

      The weaknesses of the study are minor and addressable. As the study focuses on the mouse anterior segment, a brief discussion of potential human relevance would strengthen the work by relating the findings to human anterior segment cell types, developmental mechanisms, and possible implications for human eye disease. Data availability is currently limited, which restricts immediate use by the community. Similarly, the analysis code is not yet accessible, limiting the ability to reproduce and validate the computational analyses presented in the study.

      In the revised version we have added an additional paragraph to the discussion section highlighting the human relevance of our work. Additionally, data is public on single cell portal and GEO, accession numbers have been updated. Codes are available on Github (https://github.com/revathi-balasubramanian/Anterior-segment-development-single-cell-data-analysis).

      Reviewer #2 (Public review):

      Summary:

      This study presents a detailed single-cell transcriptomic analysis of the postnatal development of mouse anterior chamber tissues. Analysis focused on the development of cells that comprise Schlemm's Canal (SC) and trabecular meshwork (TM).

      Strengths:

      This developmental atlas represents a valuable resource for the research community. The dataset is robust, consisting of ~130,000 cells collected across seven time points from early post-natal development to adulthood. Analyses reveal developmental dynamics of SC and TM populations and describe the developmental expression patterns of genes associated with glaucoma.

      Weaknesses:

      (1) Throughout the paper, the authors place significant weight on the spatial relationships of UMAP clusters, which can be misleading (See Chari and Patcher, Plos Comb Bio 2023). This is perhaps most evident in the assessment of vascular progenitors (VP) into BEC and SEC types (Figures 4 and 5). In the text, VPs are described as a common progenitor for these types, however, the trajectory analysis in Figure 5 denotes a path of PEC -> BEC -> VP -> SEC. These two findings are incongruous and should be reconciled. The limitations of inferring relationships based on UMAP spatial positions should be noted.

      (2) Figure 2d does not include P60. It is also noted that technical variation resulted in fewer TM3 cells at P21; was this due to challenges in isolation? What is the expected proportion of TM3 cells at this stage?

      (3) In Figures 3a and b it is difficult to discern the morphological changes described in the text. Could features of the image be quantified or annotated to highlight morphological features?

      (4) Given the limited number of markers available to identify SC and TM populations during development, it would be useful to provide a table describing potential new markers identified in this study.

      (5) The paper introduces developmental glaucoma (DG), namely Axenfeld-Rieger syndrome and Peters Anomaly, but the expression analysis (Figure S20) does not annotate which genes are associated with DG.

      (1) We agree that inferring biological relationships from the spatial arrangement of UMAP clusters has limitations and we have qualified our interpretation accordingly in the text. We have also added clarifying language to the trajectory analysis in Figure 5. The intended developmental trajectory is PEC → VP → BEC and SEC; however, the cluster labels in Figure 5 were applied incorrectly. Specifically, VP, BECs cluster was mislabeled as BECs, which led to the confusion. This cluster contains VPs that transition into BECs as well as VPs that are precursors to SECs.

      (2) We recently published the P60 dataset separately (Tolman, Li, Balasubramanian et al., eLife 2025); these data consist of integrated single-nucleus multiome profiles that were subjected to in-depth analysis. Additionally, we found that integrating the P60 dataset with the developmental datasets obscured sub-clustering of mature cell types. In future manuscripts, we will pursue a more detailed analysis of TM development and perform time point–specific clustering, similar to the approach we used for endothelial cells (Figure 4e).

      Comparing proportions of cells at different ages and as the eyes grows needs to be done cautiously. Notwithstanding the limitations, the proportions of TM1, TM2, and TM3 clusters are expected to be similar between P14 and P21 as the proportions at P14 and P60 are similar when comparing to the separately analyzed P60 data. Importantly, our dissection strategy changed with age: from P2 to P14, we removed approximately one-third of the cornea, whereas at P21 and P60 we removed most of the cornea to help maximize representation of limbal cells as the eyes grew. This change in dissection likely contributed to the reduced number of TM3 cells observed at P21. TM3 cells are enriched anteriorly (at-least in adult) and so are located closer to the corneal cut during dissection of the P21 eyes (which despite being larger than younger ages are still small and more delicate to accurately dissect than at P60) and are therefore more likely to be lost. Additional details are provided in the Methods section and the caveats surrounding our dissection method have now been included.

      (3) For Figure 3a and b, we have now pseudo-colored the spaces and provided a quantification of how both TM volume and intratrabecular spaces change with developing age (Figure 3c).

      (4) We have now included a supplemental table of markers for developing and mature TM and SC cell types (Table S3).

      (5) We have highlighted DG genes in rectangular boxes in Figure S20.

    1. eLife Assessment

      This study provides a useful demonstration that, at least for the systems examined, aspects of the entropic contribution to protein-ligand binding can be inferred directly from crystallographic data. In doing so, it strengthens a view of crystal structures as heterogeneous ensembles that are amenable to statistical-mechanical analysis rather than purely static models. The analytical approaches are carefully developed and transparently discussed, with thoughtful consideration of both successful and less effective methods, lending solid support to the central conclusions. However, because the analysis is based on a relatively small and narrowly sampled set of protein-ligand complexes, the generality of these findings remains speculative and will require broader validation.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that if they generate a weighted multi-conformer ensemble of structural models to fit crystallographic electron density data, the application of statistical mechanical methodologies to that ensemble can provide reasonable estimates of configurational entropy terms related to protein-ligand binding.

      Strengths:

      A fair range of proteins (12) and ligands (70) is included in the study. The analytical methodologies are well described. Both successful and less successful analytical approaches are discussed, and the latter are frequently as insightful as the former.

      Weaknesses:

      Compared to the universe of protein-ligand complexes, this dataset is inevitably very limited, so the generality of the observations made here remains speculative. Though a fair range of proteins is studied, the dynamic range in the binding affinity data is limited. The practical utility of the approach is never really commented on.

    3. Reviewer #2 (Public review):

      The manuscript by Miller and Wankowicz (M&W) develops a crystallographic approach to predict the contribution of protein conformational entropy to the total binding entropy using multi-conformer ensemble models. The approach loosely follows the path developed by Wand using NMR relaxation methods. Their approach is to generate local crystallographic order parameters (analogous to NMR order parameters) to estimate protein conformational entropy and then combine this with statements about water entropy. The static view of the ensemble is perhaps easier to grasp, with respect to entropy, than the NMR-based dynamical view. This approach is potentially ground-breaking and of great importance given the ease, relative to NMR, with which the source data can be obtained. However, the approach has several deficiencies, only some of which are noted by the authors.

      Like the initial Wand approach (Frederick et al Nature, 2007), M&W develop a simple counting relationship between members of the ensemble and a statement about conformational entropy. For reasons that are not clear, M&W utilize "per residue" scaling, which was initially introduced by Wand but later discarded for the more physically meaningful "per torsion angle" scaling. As noted in the Nature 2007 paper, this assumes uncorrelated occupancy. The current Wand approach (Caro et al PNAS, 2017) subsumes correlated occupancy and potentially incomplete sampling of the ensemble into an empirically determined scaling parameter (sd). This is likely a major contributor to the mysterious 1/4 scaling factor that is introduced. It is not clear to me how discrete conformational states are counted from the qFit models. Using the B-factor, as opposed to a thermal factor, to account for motion in a rotamer well seems suspect. With some irony, M&W only look at chi-1 rotamers in distinct contrast to the NMR approach, which looks at the end of the side chain, which captures the entire disorder. On the other hand, the crystallographic approach "sees" all side chains, whereas the NMR approach, as currently rendered, looks only at methyl-bearing side chains and requires coupling to neighbors to report on all side chains (see Kasinath JACS 2013 and Wand & Sharp ARB 2018).

      Nevertheless, as noted by Nature 2007, the fact that a linear relationship is seen between the apparent conformational entropy and total binding entropy suggests that the former is a major component of the latter. It also reinforces the idea that dSrt is constant for higher affinity complexes, i.e., residual rigid-body motion of protein relative to ligand is limited (a conclusion reached in PNAS 2017) but not mentioned. This is an important result.

      The classic hydrophobic effect is potentially a significant component of total binding entropy. Here, the manuscript falls flat by focusing on crystallographically resolved waters. As shown in site-resolved detail (Nucci et al, NSMB 2011 and others), hydration water has a range of residual motion (entropy) that will modulate contributions to water entropy upon displacement from an interface. A very clear example of the potential for large contributions was demonstrated in the wet interface of a barnase-DNA complex (PNAS 2017). The fact that the classic dASA treatment failed, I think, points to problems elsewhere in the approach.

      I note that the range of ligand types explored by M&W is quite limited as compared to PNAS 2017, making generalization somewhat difficult (see Wand Cur. Opin. Struct. Biol, 2013 for why this is important). Finally, it is disappointing that the authors chose not to examine systems common to PNAS 2017, making direct comparison to the NMR method impossible.

      In summary, this manuscript sets the field in a new direction. It is a first serious look at conformational entropy using crystallographic approaches. If fully validated, this approach would permit an explosion of insight since the crystallography is now straightforward, very fast and capable of approaching larger systems, relative to the NMR approach. However, there are missing quantitative elements represented by a formal relationship that is fitted by the data. I do not think this is a fatal flaw for this manuscript, however. If the supplementary material is improved for clarity and completeness (e.g, include tables of thermodynamic data; conformer analysis; B-factors) such that all figures could be independently reproduced and therefore analyzed in different ways, and the comments made above are addressed, if not resolved, then I think this manuscript could become a keystone for this new direction.

    1. eLife Assessment

      This study provides valuable insights into how cells maintain sphingolipid homeostasis through transcriptional control and regulated protein degradation in response to changes in sphingolipid levels. The evidence supporting the conclusions is convincing overall, with solid genetic and biochemical approaches, while some mechanistic aspects remain to be clarified. This work will be of interest to researchers studying lipid metabolism and membrane biology.

    2. Reviewer #1 (Public review):

      Matsumoto et al. identify Com2, a C2H2-type zinc finger transcription factor not previously linked to sphingolipid metabolism, as a regulator of this pathway in budding yeast. They show that depletion of sphingolipids by myriocin, an inhibitor of serine palmitoyl transferase, increases Com2 expression. This, in turn, promotes the expression of the protein kinase Ypk1 and enhances TORC2-dependent phosphorylation of Ypk1. The authors identify a Com2-binding site in the YPK1 promoter and provide evidence that Com2 functions upstream of Ypk1 to regulate its<br /> expression. They further report that Com2 abundance is controlled by the ubiquitin-proteasome system: degradation of Com2 is inhibited by myriocin treatment and enhanced by phytosphingosine. Mutational analyses of putative phosphorylation and ubiquitination sites support a role for these modifications in regulating Com2 stability. Based on these findings, the authors propose that Com2 acts as a transcriptional regulator of sphingolipid metabolism that responds to sphingolipid levels and promotes Ypk1 expression.

      Strengths:

      This study provides a valuable finding on the regulation of sphingolipid synthesis by the transcription factor Com2 in budding yeast. The evidence supporting the authors' claims is solid, although additional evidence clarifying the mechanisms and biological significance of ubiquitin-proteasome-mediated degradation of Com2 would strengthen the work. This work will be of interest to microbiologists studying budding yeast.

      Weaknesses:

      The biological significance of Com2 degradation is not sufficiently clear, which represents an important limitation of the study. It would also be important to determine whether Com2 is actively degraded under normal growth conditions, such as during logarithmic growth in the absence of drug treatment.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Matsumoto and co-workers use budding yeast as a model organism to identify and characterize transcriptional mechanisms that homeostatically regulate sphingolipid metabolism. Through a genetic suppressor screen and a series of genetic, molecular, and biochemical analyses, they identify the transcription factor Com2 as a key regulator that responds to sphingolipid levels and regulates the expression of genes such as YPK1, which in turn controls the activity of several enzymes in the yeast sphingolipid biosynthetic pathway.

      Com2 itself is further regulated by the ubiquitin proteasome system in response to sphingolipid levels. High sphingolipid levels promote proteasomal degradation of Com2, whereas low sphingolipid levels stabilize Com2. These findings suggest that Com2 is a central component of a feedback system that helps maintain sphingolipid homeostasis.

      Strengths:

      The identification of Com2 as an upstream regulator of the TORC2-Ypk1 pathway is supported by multiple orthogonal lines of evidence. The authors also provide mechanistic insight into how Com2 protein levels are dynamically controlled through phosphorylation and ubiquitin-mediated degradation. Stabilization of Com2 in response to sphingolipid depletion appears to be required for the transcriptional upregulation of YPK1 expression.

      Weaknesses:

      Although several important questions remain unresolved, such as which kinases function upstream of Com2 and which ubiquitin ligase(s) target Com2, this work is nevertheless likely to have a meaningful impact on the field of sphingolipid metabolism. The identification of a regulated transcription factor that responds to sphingolipid levels may also be of broader interest to researchers studying membrane homeostasis.

    4. Reviewer #3 (Public review):

      This paper extends the authors' 2022 studies of how the synthesis of membrane sphingolipids is regulated in budding yeast. Here, they hypothesized that overexpression of a protein involved in sphingolipid (SL) biosynthesis would confer resistance of lip1-1 cells, which are Dox-inducibly defective in expression of a ceramide synthase regulatory subunit, to myriocin (Myr), a serine palmitoyltransferase inhibitor that inhibits SL synthesis. To test this idea, they transformed lip1-1 cells with a multi-copy genomic library, selecting for Myr resistance. Apart from LIP1 itself and YPK1, a protein kinase downstream of TORC2, COM2, which encodes the Com2 C2H2-type zinc finger transcription factor, was the most frequent hit in the screen. They went on to show that com2Δ cells exhibited Myr sensitivity, and that Com2 protein expression was induced under conditions that reduced complex sphingolipid synthesis, such as Myr-treatment. Using ypk1-as ypk2Δ cells and the 3-MB-PP1 Ypk1as a selective Ypk1as kinase inhibitor, they showed that Com2 phosphorylation was independent of Ypk1 activity, suggesting that Ypk1 lies downstream of Com2. Consistently, Myr treatment, which reduces SL synthesis, resulted in an increase in both Com2 and Ypk1 proteins. By generating a Ptet-off-GFP-COM2 strain they showed that when Dox was removed to induce GFP-Com2 overexpression, Myr resistance was increased. They went on to show that Com2 binds to a Com2 response element in the YPK1 promoter and drives expression of Ypk1. This was confirmed by showing that expression of a YPK1-driven lacZ reporter gene was also elevated when GFP-Com2 overexpression was induced. CRISPR deletion of the putative Com2-binding site (CBS) from the endogenous YPK1 promoter was used to generate PYPK1-ΔCBS cells, which showed a significant reduction in Ypk1 expression and exhibited intermediate Myr sensitivity, suggesting that Com2 is important for but not the only regulator of Ypk1 expression. Analysis of SL levels showed that they largely paralleled the levels of Ypk1 protein and active pT662 Ypk1. Using deletion analysis of the COM2 gene, they showed that residues 2-190 and the C-terminal DNA binding domain of Com2 were essential for Com2 function in the SL synthesis pathway. Deletion of {greater than or equal to}40 amino acids from the N-terminus increased expression of Com2 protein irrespective of Myr treatment, suggesting that Com2 protein levels are regulated by protein stability. Consistently, they found the high level of Com2 protein induced by Myr was rapidly reversed by treatment with phytosphingosine (PHS), a ceramide precursor that bypasses the Myr-blocked step and restores SL synthesis. The reduction in Com2 protein plus PHS was prevented by MG132 proteasome inhibitor treatment and led to the accumulation of polyUb-Com2 species, consistent with Com2 being negatively regulated by SL-induced UPS-mediated degradation. Based on the use of selective inhibitors of different steps in SL synthesis, they showed that SL biosynthesis up to the level of MIPC (mannnosyldiinositol phosphorylceramide) is required for the SL-mediated degradation response. Based on individual and combined K to R mutagenesis of the three Lys in Com2 1-49, they showed that K23, K35 and K51 in combination are needed for PHS-induced Com2 degradation, and therefore are likely to be the main Com2 Ub sites. Finally, they observed that PHS induced an increase in K3R Com2 phosphorylation, finding that an S/T10A mutant was only weakly phosphorylated and was resistant to PHS-induced degradation, suggesting that phosphorylation of Com2 is required for PHS-dependent degradation.

      The paper is clearly written, and the data in Figures 1-6 show convincingly that the Com2 zinc finger protein, by inducing the expression of a set of genes, including YPK1 and LCB1, plays an important role in sphingolipid (SL) homeostasis in yeast under conditions when sphingolipid levels are low. However, the data in Figures 7 and 8, where the authors provide evidence that the Com2 protein was rapidly degraded in a proteasome-dependent manner in response to phytosphingosine (PHS) treatment, dependent on the N-terminal 40 residues of Com2 and a combination of three Lys residues in this region, are intriguing but incomplete. There are a number of issues, including the identity of the Com2 ubiquitylation sites. They showed that the K23/35/51R Com2 mutant was stabilized, but did they provide direct evidence that these three Lys are in fact ubiquitylated (e.g. GG-K peptide enrichment based MS analysis of Ub-Com2 from PHS-treated, MG132-treated cells). They showed that PHS treatment increased Myc13-tagged Com2 ubiquitylation in the presence of MG132, but did not show that the K3R Com2 mutant (or the S/T10A phosphorylation site Com2 mutant) failed to be ubiquitylated. They also found that the WT Com2 and particularly the K3R Com2 mutant protein exhibited hyperphosphorylation in response to PHS treatment, and that mutation of 10 potential pSer sites to Ala abolished this effect, and stabilized the Com2 protein. However, it is unclear whether the K3R mutation led to increased Com2 hyperphosphorylation per se following PHS treatment, or whether this is because there is more K3R protein, as they suggest might be the case. It is also not clear what protein kinase is responsible or how it might be activated when SL levels are high. In addition, the E3 Ub ligase needed for Com2 degradation was not identified, and it is not clear whether Com2 phosphorylation is directly involved in its recognition by a phosphodependent E3 Ub ligase, as they propose in the model shown in Figure 9. Finally, and perhaps most importantly. It is unclear how elevated levels of phytosphingosine or any sphingolipid are sensed by the Com2 pathway in order to switch on the degradation response as a negative feedback event. The model depicted in Figure 9 exposes all of these unknowns. The paper would be significantly strengthened by additional experiments defining how complex SL levels are sensed, how Com2 is phosphorylated in response to SL sensor signals, and how (phospho)Com1 is recognized for ubiquitylation and degradation.

      In summary, the finding that the Com2 zinc finger transcription factor is an upstream regulator of the sphingolipid biosynthesis pathway in budding yeast, acting as part of an SL sensor system to maintain sphingolipid homeostasis, is new and potentially important. However, more mechanistic work needs to be done to address the unanswered questions raised by the data in Figures 7 and 8.

    1. eLife Assessment

      This study presents important findings on the molecular mechanisms governing how the natural killer cell receptor KIR2DL4 interacts with HLA-G and undergoes internalization. The authors provide solid evidence for an allosteric disulfide-bond switch that regulates receptor activity, using a multifaceted approach that includes mutagenesis, mass spectrometry, and imaging. The work would be further strengthened by validating these mechanisms in primary immune cells and providing direct structural evidence for the proposed ligand-binding interface.

    2. Reviewer #1 (Public review):

      Summary:

      This paper asks how the NK cell receptor KIR2DL4 binds HLA-G and undergoes endocytosis. The authors propose that an allosteric disulfide-bond switch controls whether the receptor is in a ligand-binding or non-binding state, and they support this model using mutagenesis, imaging, mass spectrometry, and structural prediction.

      Strengths:

      A major strength is the use of diverse, complementary approaches to validate the central claim. The authors combined unbiased random mutagenesis to identify key residues, confocal microscopy to track cellular localization , and mass spectrometry to quantify the redox states of specific disulfide bonds. These methods consistently support a single model: an allosteric disulfide switch. The transition between a Cys10-Cys28 bond and a Cys28-Cys74 bond serves as a functional switch that controls whether the receptor resides at the plasma membrane to bind ligand or remains inactive in endosomes.

      Weaknesses:

      The core model is interesting, but some of the strongest mechanistic claims still rely heavily on structure prediction rather than direct structural evidence, especially the proposed HLA-G contact surface in Figure 6.

      The paper supports an effect of the disulfide state on trafficking and uptake, but the case for direct KIR2DL4-HLA-G binding still feels somewhat indirect. The manuscript itself notes that direct binding had not been previously shown, and the current explanation partly depends on inference about which disulfide state is present.

      Most of the main experiments are done in transfected 293T cells, so it is still not fully clear how strongly this mechanism carries over to the more relevant NK-cell setting discussed in the paper.

      The cellular evidence for the PDI story is not specific, since it depends a lot on inhibitor and blocking experiments that could affect the broader extracellular redox environment.

    3. Reviewer #2 (Public review):

      Summary:

      Rajagopalan et al show how extracellular domain features regulate KIR2DL4 internalization. The trafficking phenotypes of cysteine mutants are logically organized, and well-summarized in a Table. The disulfide mapping and differential alkylation strategy are appropriate and provide strong support for alternative disulfide configurations in D0. The higher accessibility or more selective reduction of Cys10-Cys28 as compared to Cys28-Cys74 by PDI is a key mechanistic anchor.

      Strengths:

      The identification of a conformational switch in KIR2DL4 is conceptually novel. Experimental elegance, detailed and well-written.

      Weaknesses:

      Most of the mechanistic work was shown in HEK293. The authors should exhibit relevance using primary NK cells (using primary NK)

    1. eLife Assessment

      This study shows that Znhit1, a regulator of chromatin and of the histone variant H2A.Z, is required for progression through meiotic prophase. It is an important observation that describes the role of epigenetics and gene expression during meiosis. The analysis is based on complementary approaches at the cytological, single-cell, and genomic levels that provide solid evidence for the role of Znhit1 in the control of gene expression and in the loading of H2A.Z in mouse spermatocytes.

    2. Reviewer #1 (Public review):

      Summary:

      Sun et al. generated germline-specific cKO mice for the Znhit1 gene and examined its effect on male meiosis. The authors found that the loss of Znhit1 affects the transcriptional activation of pachytene. Znhit1 is a subunit of the SRCAP chromatin remodeling complex and a depositor of H2AZ, and in cKO spermatocytes, H2AZ is not deposited into the gene region. The authors claim that this is why the PGA was not activated. These findings provide important insights into the mechanisms of transcriptional regulation during the meiotic prophase.

      Strengths:

      The authors used samples from their original mouse model, analyzing both the epigenome and the transcriptome in detail using diverse NGS analyses to gain new insights into PGA. The quality of the results appeared excellent.

      Comments on revisions:

      Sun et al. have responded to each comment with great care and sincerity, and substantial improvements are evident.

      In particular, the addition of scRNA-seq data from P35 samples appears to play an important role in supporting the authors' claims.

      However, there is still room for improvement in the reanalysis of the data and in the Discussion section.

      From the data perspective, for example, the authors state in line 347 of the revised manuscript that "We found that Znhit1-deficient spermatocytes phenocopied abnormal meiotic phenotypes observed in A-MYB mutants." However, the corresponding descriptions in the main text and figure legends are not sufficiently detailed, and therefore do not fully support or substantiate this interpretation. Incorporating a statistical comparison between DEGs in Znhit1-sKO and A-myb KO would likely strengthen this point.

      Regarding the overall structure of the Discussion, the connections among delayed DSB repair, MSCI, and PGA regulation via H2A.Z remain somewhat descriptive and difficult to follow. This may reflect a lack of direct evidence linking these processes; however, a more logically structured and clearly articulated Discussion would improve clarity.

    3. Reviewer #2 (Public review):

      Summary:

      The study demonstrates that Znhit1 regulates male meiosis, with deletion causing pachytene failure associated with defective expression of pachytene genes and subtle effects on X-Y pairing and DSB repair. The authors attribute this phenotype to the defective incorporation of the Znhit1 target H2A.Z into chromatin.

      Strengths:

      The paper and the figures are well presented and the narrative is clear. Evidence that the conditional deletion strategy removes Znhit1 is strong, with multiple orthogonal approaches used. Most of the meiotic phenotyping is well performed, and the omics analysis clearly identifies a dramatic effect on the meiotic gene expression program. The link to H2A.Z and A-MYB adds a mechanistic angle to the study.

      Comments on revisions:

      In the revision, the authors have addressed most of my comments. The only incomplete one is comment 1, where I asked them to define the stage of germ cell arrest by histology. I requested this because the stage of arrest they identified is so unique. They didn't do it, and instead used the scRNAseq to show a depletion at the late pachytene stage onwards. I guess it supports their main findings, but it's a bit disappointing.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Sun et al. generated germline-specific cKO mice for the Znhit1 gene and examined its effect on male meiosis. The authors found that the loss of Znhit1 affects the transcriptional activation of pachytene. Znhit1 is a subunit of the SRCAP chromatin remodeling complex and a depositor of H2AZ, and in cKO spermatocytes, H2AZ is not deposited into the gene region. The authors claim that this is why the PGA was not activated. These findings provide important insights into the mechanisms of transcriptional regulation during the meiotic prophase.

      Strengths:

      The authors used samples from their original mouse model, analyzing both the epigenome and the transcriptome in detail using diverse NGS analyses to gain new insights into PGA. The quality of the results appeared excellent.

      Weaknesses:

      Overall, the data is inconsistent with the authors' claims and does not support their final conclusions. In addition, the sample used may not be the most suitable for the analysis, but a more suitable sample would dramatically improve the overall quality of the paper.

      Thank you for your comprehensive summary of our study and your thoughtful insights into its strengths and weaknesses. We greatly appreciate this valuable feedback, which helps us further improve our work. Below, we provide a detailed response addressing each of the points you raised.

      Reviewer #1 (Recommendations For The Authors):

      Major revisions:

      Surprisingly, many genes were upregulated in the scRNA-seq results. How many XY genes are included? Discuss why many genes are up-regulated in Fig. 5E whereas bulk RNA-seq showed only 70 genes were down-regulated. Since apoptosis-related factors are up-regulated in Fig5E, could these up-regulated genes be due to the high content of the transcriptome of dead cells? As you know, cell death starts, but randomly and violently disrupts the transcriptome, so we think it is not desirable to analyze the transcriptome with dead cells in the mix. Describe this point appropriately in the text or generate new data without dead cells.

      We sincerely appreciate the reviewer’s critical points. Below, we address each point sequentially:

      (1) To address the question about XY-linked genes, we utilized scRNA-seq data to identify differentially expressed sex chromosome genes in spermatocytes at different stages. Our analysis revealed an aberrant activation of XY-linked genes relative to controls. Specifically, 120 XY-linked genes were aberrantly activated in zygotenestage spermatocytes, and 119 XY-linked genes showed aberrant activation in pachytene-stage spermatocytes (revised Fig. 4F). This observation directly indicates that Znhit1 knockout impairs Meiotic Sex Chromosome Inactivation (MSCI), a finding that aligns with our prior characterization of XY chromosome synapsis defects in Znhit1-deficient spermatocytes.

      (2) Two key reasons explain the discrepancy between scRNA-seq and bulk RNA-seq results:

      First, scRNA-seq employs a more permissive threshold for identifying DEGs (log2 fold change [log2FC] = 0.25), thereby enhancing sensitivity to subtle expression changes and enabling the detection of more upregulated genes. In contrast, bulk RNAseq uses a stricter threshold (log2FC = 1), which filters out these subtly upregulated transcripts, resulting in fewer DEGs overall.

      Second, scRNA-seq can capture cell subset-specific differential expression. In contrast, bulk RNA-seq averages signals across mixed cells, masking such subsetspecific expression changes.

      These clarifications have been included in the Data Analysis section of the revised manuscript.

      (3) We fully agree with the reviewer’s concern that dead cells could confound transcriptomic analyses. Before downstream analysis, we excluded non-viable cells via stringent QC: cells with mitochondrial RNA (mtRNA) content exceeding 15% were removed, as high mtRNA content is a well-established marker of cell death or compromised viability. To further validate that upregulated genes were not driven by dead cell contamination, we analyzed the correlation between the expression of apoptosis-related genes and mtRNA fractions in our data. This analysis revealed no significant correlation (Pearson correlation coefficient, r = -0.02; please see Author response image 1). These results collectively rule out dead cell transcriptome contamination as the primary cause of the observed gene upregulation.

      Author response image 1.

      Scatter Chart showing the Pearson correlation between apoptosisrelated genes and mitochondrial RNA fractions in scRNA-seq data.

      Line 280-286: The data in Figures 7I and J are confusing: as shown by KAS-seq, it is natural that ssDNA is not formed in the promoter region in Znhit1-cKO sample because transcription does not proceed, but why is ssDNA formed in the enhancer region in the first place in control and then lost in Znhit1-cKO sample? Generally, it is said that in the enhancer region, including the super-enhancer region, doublestranded DNA is not dissociated, thus not forming ssDNA. Discuss why the loss of ssDNA in the enhancer region affects transcription with appropriate citations. Also, show whether genes downstream of the missing ssDNA in the promoter region have abnormal transcriptional activity, along with the RNA-seq data. Furthermore, in the region shown in Figure 7I, why the chromatin is even more open, as shown by ATACseq in Znhit1-cKO. Discuss whether this is related to transcriptional progression or aberrant substitution with H2A. If the function of ZNHIT1 is to replace H2A with H2AZ for PGA, it is not necessary to show the H2A level in Znhit1-cKO.

      We appreciate the reviewer’s constructive comments.

      (1) ssDNA dynamics in enhancer regions: Emerging evidence demonstrates that active enhancers undergo transient DNA unwinding to form ssDNA, a process critical for transcriptional regulation by transcribing enhancer RNAs (eRNA). KAS‑seq is sufficiently sensitive to detect ssDNA in enhancer regions (Kim et al., 2010; Wu et al., 2020). It has been shown that H2A.Z (deposited by the ZNHIT1-SRCAP complex) is required for maintaining enhancer accessibility and dynamic unwinding (Sporrij et al., 2023). In this study, we found that Znhit1 deletion and defective H2A.Z incorporation impaired enhancer ssDNA formation, indicating that ZNHIT-H2A.Z plays an important role in the activity of both promoter and enhancer.

      (2) Impact of ssDNA loss on transcription: To address how missing ssDNA affects transcriptional activity, we further analyzed changes in KAS‑seq signals following Znhit1 knockout. Overall, KAS‑seq signals were significantly reduced upon Znhit1 depletion, confirming that Znhit1 is essential for ssDNA formation. Further examination of KAS‑seq signals at promoters of downregulated genes also revealed reduced signals (revised manuscript, Fig. S8). In contrast, KAS-seq signals of upregulated genes remained relatively low and showed no changes in both the control and knockout groups, and their upregulation probably results from indirect regulation. These results underscore the importance of ZNHIT1-mediated chromatin states in regulating ssDNA formation and gene expression.

      (3) Aberrant chromatin openness in Znhit1-cKO (ATAC-seq): The increased chromatin accessibility detected by ATAC-seq likely represents a disorganized, nonfunctional state rather than productive transcriptional openness. H2A.Z normally constrains chromatin dynamics to facilitate ordered transcriptional regulation (Cole et al., 2021); its absence in Znhit1-cKO leads to higher ATAC-seq signals, suggesting that this aberrant openness fails to support proper assembly of the transcriptional machinery.

      Minor revisions:

      Line 106. The text says that they looked for chromatin factors, but the legend says that they looked for epigenetic factors. The text must be consistent.

      We have corrected it in the revised manuscript (line 801).

      Line 107. Although it is stated that the transcriptional data published here were used, it appears from the cited references that they are scRNA-seq data. A clear explanation is required in the text or legend.

      We have revised this data as scRNA-seq data (line 107).

      Line 141-143: Using TUNEL analysis in Figure 4F, the authors show that Znhit1cKO testis cells contain many dead cells. Describe the type or stage of the apoptotic cells.

      We appreciate the reviewer’s suggestion. Specifically, we performed TUNEL staining on testes isolated from P14 mice, a critical time point for pachytene development (revised Fig. 2D). We tested this by showing that apoptosis-related genes were significantly upregulated in pachytene-stage spermatocytes in scRNA-seq data (revised Fig. 4D). To further validate this observation, we performed scRNA-seq from P35 testis samples. The results revealed a significant reduction in late pachytene-stage spermatocytes in Znhit1-cKO samples (revised Fig. 2F), consistent with apoptotic loss of pachytene cells. Collectively, these data confirm that Znhit1 knockout impairs pachytene-stage spermatocyte development.

      The authors claimed that the loss of Znhit1 lowers the transcription of a group of genes involved in homologous recombination, including Rnf212, causing a delay in homologous recombination; however, if the process of homologous recombination is delayed, homologous chromosome pairing and synapsis are affected unless DSB repair is completed. Provide a satisfactory explanation for the fact that DNA damage remains on autosomes despite complete synapsis, as shown in Figure 3C, which is likely not solely due to delayed homologous recombination.

      Thank you for this insightful comment. We fully agree that persistent autosomal DNA damage cannot be explained solely by delayed homologous recombination. To resolve this question, we further analyzed autosomal synapsis through SYCP1 and SYCP3 staining. While autosomal synapsis appeared morphologically complete, we identified subtle but significant synapsis defects in autosomal terminal regions (revised Fig. 3A). This suggests that Znhit1 knockout also results in autosomal synapsis defects. We speculate that these synapsis defects are associated with the unresolved autosomal DNA damage we observed.

      Lines 150-163. With regard to XY unpairing in Znhit1-cKO pachytene spermatocytes, there is insufficient discussion as to whether this is due to transcriptional aberrations.

      Thank you for highlighting the need to link transcriptional aberrations to XY unpairing in Znhit1-cKO pachytene spermatocytes. To address this, we analyzed sex chromosome transcription using scRNA-seq data. Relative to controls, 120 XYlinked genes were aberrantly activated at zygotene, and 119 were upregulated at pachytene in Znhit1-cKO spermatocytes (revised Fig. 4F), directly demonstrating Znhit1 knockout disrupts Meiotic Sex Chromosome Inactivation (MSCI). Given that intact MSCI is required to stabilize XY synapsis in pachytene spermatocytes, we conclude that the observed XY unpairing is likely a direct consequence of these sex chromosome transcriptional abnormalities. We add this information to the revised manuscript (lines 221-226).

      Line 187-194. Analysis of the scRNA-seq data is shown in Figure 4, but it lists several genes as stage-specific markers, some of which do not have well-understood meiotic functions. Please cite a reference paper that provides sufficient evidence to qualify this stage.

      In response to this comment, we have refined the presentation of marker genes used for cell annotation (revised Fig. S4B). We have incorporated relevant references supporting their utility as stage-specific markers for the meiotic stages (line 187).

      Line 225-233: If Znhit1 is important for H2AZ deposition and regulates PGA through it, how does it regulate HR-related genes that are expressed earlier through H2AZ deposition during the pachytene stage? For example, Rnf212 is not specifically expressed during the pachytene stage but is one of the targets of MEIOSIN, so it is expressed at an earlier stage.

      Thank you for this insightful comment. We fully acknowledge the reviewer’s key observation that HR-related genes such as Rnf212 are MEIOSIN targets that initiate transcription at earlier meiotic stages, before the pachytene stage. Our stage-resolved scRNA-seq data further showed that the expression of Ccnb1ip1 and Rnf212 was significantly upregulated from zygotene to pachytene, following their initial transcriptional onset. We next showed that the loss of H2A.Z deposition induced by Znhit1 deletion specifically impaired this pachytene-specific secondary transcriptional activation, rather than the early MEIOSIN-driven expression onset (please see Author response image 2).

      Author response image 2.

      Plots showing the expression level of indicated genes in scRNAseq data.

      Line 245-251: As shown in Figure 6E, more than 14,000 genes have H2AZ peaks. In contrast, only approximately 60% of the genes downregulated by Znhit1-cKO appeared to be directly affected by H2AZ. Are the remaining 40% of genes regulated in a different way that is not mediated by H2AZ? Also, only a few percent of the genes with H2AZ peaks are affected, but why are only genes with A-MYB involvement affected, as shown in Figure 7?

      Thank you for these insightful and constructive comments. For the ~40% of downregulated genes not directly linked to H2A.Z, they were likely regulated through indirect mechanisms. H2A.Z deposition mediated by ZNHIT1 may influence upstream transcriptional regulators (e.g., transcription factors or coactivators), whose dysregulation in turn affects these genes.

      The selective effect of H2A.Z loss on A-MYB target genes is explained by the strict context-dependent function of H2A.Z, which requires stage-specific partner transcription factors to exert its regulatory activity. During the zygotene-to-pachytene transition, A-MYB acts as the master regulator of pachytene gene activation and forms a functional collaborative complex with H2A.Z to drive target gene transcription. Disrupted H2A.Z deposition upon Znhit1 deletion specifically impairs the activity of this A-MYB-H2A.Z complex, leading to selective downregulation of A-MYB targets. Other H2A.Z peak-associated genes may rely on alternative cofactors and compensatory mechanisms.

      Line 245-256: Figures 6 and F show that the localization of H2AZ is reduced in Znhit1-cKO mice, which means that no substitution with H2A occurs. If so, show it in the data because the localization of H2A should be increased compared to that in the control.

      To clarify the status of H2A, we have now detected immunofluorescent staining against H2A. While H2A.Z deposition was clearly impaired following Znhit1 deletion, the global level of H2A did not change significantly (Author response image 3). We speculate that this observed absence of a compensatory increase in H2A is likely due to the intrinsically low abundance of the histone variant H2A.Z relative to canonical histone H2A under physiological conditions.

      Author response image 3.

      Immunostaining of SYCP3 and H2A in spermatocyte testis sections of control and Znhit1-sKO mice, Scale bar, 40 μm.

      Reviewer #2 (Public Review):

      Summary:

      The study demonstrates that Znhit1 regulates male meiosis, with deletion causing pachytene failure associated with defective expression of pachytene genes and subtle effects on X-Y pairing and DSB repair. The authors attribute this phenotype to the defective incorporation of the Znhit1 target H2A.Z into chromatin.

      Strengths:

      The paper and the figures are well presented and the narrative is clear. Evidence that the conditional deletion strategy removes Znhit1 is strong, with multiple orthogonal approaches used. Most of the meiotic phenotyping is well performed, and the omics analysis clearly identifies a dramatic effect on the meiotic gene expression program. The link to H2A.Z and A-MYB adds a mechanistic angle to the study.

      Weaknesses:

      (1) Current literature demonstrates that meiotic mutants arrest at one of two stages: midpachytene (stage IV of the seminiferous cycle) or metaphase I (stage XII of the seminiferous cycle). This study documents that in the Znhit1 KO the midpachytene marker H1t appears normally, but that cells arrest before diplotene. If this is true, then arrest must occur during late pachytene, which based on my knowledge has never been documented for a meiotic KO. To resolve this, the authors should present stronger histological substaging evidence to support their claim.

      Thank you for this insightful and constructive comment. To achieve highresolution tracking of cell lineage progression, we performed scRNA-seq analysis using P35 testes in this revised manuscript. scRNA-seq data showed that germ cells normally progressed through all meiotic stages and successfully gave rise to spermatids in control groups. By contrast, in the Znhit1 knockout group, late pachytene spermatocytes decreased significantly, and only very few subsequent germ cell types were observable (revised Fig. 2F, G). In scRNA-seq data, although very few diplotene spermatocytes and meiotic metaphase I cells were detectable, these cells still appeared abnormal, as evidenced by their extremely low Pou5f2 expression. We have revised our description of the meiotic arrest stage in the manuscript.

      (2) The authors overlooked the possible effects of Znhit1 deletion on MSCI. Defective MSCI is a well-established cause of pachytene arrest. Actually, the fact that they see X-Y pairing failure should alert them even more strongly to this possibility because MSCI failure is often associated with defective X-Y pairing. This could be easily addressed by examination of their RNAseq data.

      To address the concern that Znhit1 deletion may impact Meiotic Sex Chromosome Inactivation (MSCI), we analyzed XY-linked gene expression using scRNA-seq data from spermatocytes at distinct stages. Our analysis revealed aberrant activation of XY-linked genes in Znhit1-CKO spermatocytes relative to controls. Specifically, 120 XY-linked genes were activated at zygotene, and 119 XY-linked genes were upregulated at pachytene (revised Fig. 4F). This observation directly demonstrates that Znhit1-CKO impairs MSCI, which aligns with our prior characterization of defective X-Y chromosome synapsis in Znhit1-deficient spermatocytes. To explicitly resolve this concern, we have integrated these MSCIfocused RNA-seq analyses into the revised Results section (lines 221-226).

      (3) The recombination assays need attention.

      In the text the authors state that they studied RPA2 and DMC1, but the figures show RPA2 and RAD51.

      The RPA counts are not quantitated.

      The conclusion that crossover formation fails (based on MLH1 staining) is not justified. This marker does not appear in wt males until late pachytene, so if cells in this mutant are dying before that stage, MLH1 cannot be assessed.

      The authors state that gH2AZ persists in the KO, but I'm not convinced that they are comparing equivalent stages in the wt and KO. In Figure 3C, the pachytene cell is late, whereas in the mutant the pachytene cell is early or mid (when residual gH2AX is expected, even in wt males).

      Previous work (PMID: 23824539) has shown that antibodies reportedly detecting pATM in the sex body are non-specific. I therefore advise caution with the data shown in Figure 3D.

      We appreciate the reviewer’s detailed feedback on our recombination assays and have addressed each concern as follows:

      (1) Discrepancy between text and figures (RPA2/DMC1 vs. RPA2/RAD51): We have corrected this in the revised manuscript.

      (2) Quantitation of RPA2 foci: We have supplemented quantitative analysis of RPA2 foci (revised Fig. S3).

      (3) Conclusion on crossover failure: Single-cell RNA sequencing data from P35 testes definitively confirmed that Znhit1 knockout spermatocytes successfully progressed to the late pachytene stage, ruling out the possibility that our MLH1 staining results are confounded by cell death or arrest before this critical stage. In addition, analysis of transcriptome datasets revealed significant downregulation of important genes required for homologous recombination and crossover formation, including Ccnb1ip1 and Rnf212. Reduced expression of these essential factors may impair the assembly of MLH1 crossover foci. These data demonstrate that ZNHIT1 is essential for proper homologous recombination and crossover formation during male meiosis. We have revised the text to emphasize this context.

      (4) γH2AX persistence and stage matching: We have replaced the images with more representative, stage‑matched pachytene spermatocytes from wild‑type and Znhit1‑KO mice (revised Fig. 2C). Furthermore, prompted by the insightful comment from Reviewer 1, we carefully re‑examined autosomal synapsis and identified abnormal synapsis specifically at the terminal regions of autosomes in Znhit1‑deficient spermatocytes (revised Fig. 3A). These data together confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) pATM staining issue: Following the reviewer’s advice, we carefully reviewed the relevant literature (PMID: 23824539) and confirmed that the anti‑pATM antibody may exhibit non‑specific staining on the XY chromosomes. Accordingly, we have removed the pATM staining data presented in Figure 3D from the revised manuscript to ensure the accuracy and rigor of our results.

      (4) RNAseq data. The authors show convincingly that Znhit1 activates genes that are normally upregulated at the zyg-pachytene transition. They should repeat the analysis for genes normally upregulated at the prelep- lep and lep-zyg transition to show that this effect is really pachytene-gene specific.

      We appreciate this suggestion. To clarify the stage specificity of ZNHIT1’s regulatory role, we analyzed genes upregulated at the prelep-lep and lepzyg transitions. Our results showed that Znhit1 knockout had little impact on the overall expression levels of these genes (as shown in revised Fig. 4B). In contrast, as we previously reported, genes upregulated at the zygotene-pachytene transition were remarkably downregulated in Znhit1-cKO. These findings further confirm the specificity of ZNHIT1 in regulating pachytene gene expression.

      (5) I am puzzled that the title and overall gist of the study focuses on H2A.Z, when it is Znhit1 that has been deleted.

      We appreciate the reviewer’s observation and have revised the study title as suggested. Specifically, the title is now updated to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis.”

      Reviewer #3 (Public Review):

      Summary:

      Sun et al. present a manuscript detailing the phenotypic characterization of loss of Znhit1 in male germ cells. Znhit1 is a subunit of the chromatin regulating complex SRCAP that functions to deposit the histone variant H2A.Z. Given that meiosis, and specifically meiotic recombination, occurs in the context of the dynamic condensing of chromosomes, the role of chromatin regulators in general, and histone variants specifically, in mammalian meiosis is an active area of research. Previous work has shown that H2A.Z is found at the locations of recombination in plants, although H2A.Z was previously not found at recombination sites in mammalian meiosis. Here the authors use a conditional approach to ablate Znhit1 in spermatocytes and characterize a block in meiosis in prophase I in the transition from pachytene to diplotene stage.

      Strengths:

      The authors combine current methods in immunohistochemistry and functional genomics to provide strong evidence of meiotic block upon the loss of Znhit1. They find that loss of Znhit1 leads to reduced incorporation of the histone variant H2A.Z, specifically at promoters and enhancers. Further, RNA sequencing found more genes are down-regulated upon loss of Znhit1 compared to upregulated, suggesting that incorporation of H2A.Z is critical for the expression of genes necessary for successful meiotic progression.

      A strength of the manuscript is tying the locations of changes in H2A.Z deposition with binding of the transcription factor A-MYB, providing a mechanism that can potentially combine the changes in chromatin regulation with variable binding of a transcription factor in gene expression in pachytene stage spermatocytes.

      Weaknesses:

      A weakness in the single-cell RNA experiment using cells from 16-day-old male mice. The authors suggest that the rationale for the experiment was to determine where the Znhit1-sKO mutant showed an arrest in meiosis, and claim that this is the pachytene stage. However, in the 'first wave' of meiosis 16-day-old mice are just beginning to enter pachytene, so cells from later meiotic stages will be largely absent in these tubules. This is clear from the UMAP showing a similar pattern of cell distributions between wild-type and mutant mice. Using older mice would have better demonstrated where the mutant and wild-type mice differ in cell-type composition.

      We appreciate the reviewer’s constructive comment. To resolve this issue, we have added new scRNA‑seq data from testes of P35 mice, which harbor a full spectrum of meiotic stages, including late pachytene, diplotene, metaphase I spermatocytes, and post-meiotic spermatids. Compared with wild-type controls, Znhit1-sKO testes exhibited a marked reduction in late pachytene spermatocytes and a near-complete loss of post-pachytene cell types, directly validating the pachytenestage meiotic arrest (revised Fig. 2F, G). All updated analyses have been integrated into the manuscript to strengthen our conclusions.

      The authors use the term pachytene genome activation (PGS) in the manuscript to suggest a novel process by which genes are specifically increased in expression in the pachytene stage of meiotic prophase I, without reference to literature that establishes the term. If the authors are putting forward a new concept defined by this term, it would strengthen the manuscript to describe it further and delineate what the genes are that are activated and discuss potential mechanisms.

      We appreciate the reviewer’s valuable feedback on our use of the term "pachytene genome activation (PGA)".

      To address this, we have revised the text to explicitly frame PGA as a stage-specific transcriptional program observed in our data, defined by the coordinated upregulation of a distinct set of genes during the pachytene stage of meiotic prophase I.

      (1) Definition and Gene Set: Using the scRNA-seq dataset, we formally defined PGA as the transcriptional wave characterized by genes with increased expression in pachytene vs. zygotene spermatocytes (n = 1,560 genes). Functional enrichment analysis shows these genes are primarily involved in DNA repair, cilium organization, and spermatid development (Table S3), consistent with the biological process of germ cell development.

      (2) Relationship to existing literature: While PGA as a term is not widely established, our data align with prior observations of pachytene-specific transcriptional upregulation (Alexander et al., 2023; Ernst et al., 2019; Turner, 2015). Importantly, Alexander et al reveals that in late meiotic stages, starting from pachynema, chromatin has a ~3-fold increase in transcription. We have added these citations to clearly illustrate the relevant advances in the field (lines 68-71).

      (3) Regulation of pachytene-stage gene expression: We further delineate that PGA is regulated by ZNHIT1-dependent H2A.Z deposition. Znhit1 deletion resulted in significant downregulation of 70.1% (1,094 out of 1,560) of these genes. This links PGA to chromatin-based regulation, where ZNHIT1-dependent H2A.Z deposition enables pachytene-specific transcription.

      Generally speaking, the authors present solid evidence for a pachytene block in male germ cell development in mice lacking Znhit1 in spermatocytes. The evidence supporting a change in gene expression during pachytene, that more genes are downregulated in the mutant compared to increased expression, and changes in histone modification dynamics and placement of H2A.Z all support a role in alterations in meiotic gene regulation. However, the support that changes in H2A.Z impacting meiotic recombination (as suggested in the manuscript title) is less supported, rather than a general cell arrest in the pachytene stage leading to cell death. The conclusions around the role of Znhit1 influencing meiotic recombination directly could use further justification or mechanistic hypothesis.

      We acknowledge the reviewer’s comments. Indeed, existing data support the presence of a pachytene block in spermatocytes of Znhit1-deficient mice, along with aberrant pachytene gene expression and impaired H2A.Z deposition.

      In response, we made the following revisions: (1) we adjusted the manuscript title and conclusion to reduce emphasis on a direct H2A.Z-recombination link, and focus instead on ZNHIT1/H2A.Z in pachytene gene regulation and meiotic progression; (2) recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery (lines 314-319).

      Reviewer #3 (Recommendations For The Authors):

      Quality of the images for meiotic spreads - images have low contrast and are tiny. It is difficult to see the SYCP3 results even when the images are magnified on the computer screen.

      We have provided new images with high resolution to ensure a clear visualization of SYCP3 signals.

      Line 165 - indicates the results for DMC1, although the figure suggests the results are for RAD51 foci.

      We have corrected this mistake.

      Line 306 - this manuscript 'confirms' that H2AZ is not found at mammalian recombination sites, a result already in the literature.

      We have corrected this mistake (lines 309-312).

      Reviewing Editor Comments:

      Major points and revisions highlighted by the reviewers:

      (1) Meiotic prophase in Znhit1KO: The main questions to clarify are the stage and status of progression, the analysis of apoptosis, and the consequences of gene expression on the X and Y. Additional analysis for DSB repair foci, gH2AX is also required. Those analysis are needed to answer to reviewer 2. Even if H2AZ was not detected at recombination hotspots, it may be possible that it plays a role in DSB repair but the level is too low for detection. This should be discussed as H2AZ was shown to be involved in DNA repair.

      We sincerely appreciate the reviewing editor’s constructive comments.

      (1) Stage and progression of meiotic prophase: We supplement P35 testes for scRNAseq. Results confirmed Znhit1-KO spermatocytes arrest at late pachytene, and postpachytene stages (diplotene, metaphase I) were nearly absent (revised Fig. 2F, G).

      (2) Apoptosis analysis: We studied this by demonstrating that apoptosis-related genes were upregulated in pachytene spermatocytes at the single-cell level (revised Fig. 4D). To further validate this finding, we performed scRNA-seq analysis on P35 testis samples. Our results revealed a marked reduction in late pachytene spermatocytes in Znhit1-cKO testes (revised Fig. 2F, G), consistent with apoptotic depletion of pachytene-stage cells. Together, these data confirm that Znhit1 ablation impairs pachytene-stage spermatocyte development.

      (3) X/Y gene expression consequences: To address this key point, we performed stage-resolved analysis of XY-linked gene expression using scRNA-seq data from different-stage spermatocytes. Compared with controls, we detected aberrant ectopic activation of XY-linked genes in Znhit1-KO spermatocytes: 120 XY-linked genes were inappropriately activated at zygotene, and 119 remained abnormally upregulated at pachytene (revised Fig. 4F). These results provide direct evidence that Znhit1 deletion impairs Meiotic Sex Chromosome Inactivation (MSCI).

      (4) DSB repair issue: We have replaced the images with more representative, stage‑matched pachytene spermatocytes (revised Fig. 3C). The revised images show consistently increased γH2AX signals in Znhit1-KO spermatocytes. Prompted by Reviewer 1’s comment, we identified abnormal synapsis at autosomal terminal regions in mutant cells. Together, these results confirm that ZNHIT1 is essential for DSB repair during male meiotic prophase I.

      (5) Potential role of H2A.Z in DSB repair: Though H2A.Z was nearly undetectable at recombination hotspots, we discuss two possibilities: (1) ZNHIT1-H2A.Z depletion dysregulated DSB repair-related genes; (2) Current ChIP-seq sensitivity may miss low-abundance H2A.Z at hotspots, which could support repair via chromatin remodeling. Future high-resolution assays (super-resolution imaging, DSB-targeted ChIP-seq) are proposed to validate this. We agree that recombination defects may be indirect consequences of failed pachytene gene regulation, rather than a direct regulatory effect of ZNHIT1 on recombination machinery.

      (2) Gene expression analysis. The first consequence of H2AZ depletion is gene expression downregulation. However, it may be not surprising that some genes are down and others upregulated. There are likely secondary and indirect effects including the upregulation of some genes. The authors should explain and discuss this point such as to answer to questions raised by reviewer 1 and 2.

      The primary consequence of H2A.Z depletion in pachytene spermatocytes is indeed widespread downregulation of genes. For the coexistence of upregulated genes, we explain this via three key points.

      (1) Technical differences between scRNA-seq and bulk RNA-seq (addressing Reviewer 1): scRNA-seq captures cell-type-specific differentially expressed genes that bulk RNA-seq masks (bulk averages signals across mixed cells, hiding changes in rare subsets). Additionally, scRNA-seq uses a lower log2(fold change) threshold (0.25 vs. 1 in bulk RNA-seq), detecting subtle upregulations missed by bulk analysis.

      (2) No dead cell contamination (addressing Reviewer 1): Stringent quality control excluded cells with >15% mitochondrial RNA. Apoptosis-related genes showed no significant correlation with mitochondrial RNA fractions (Pearson correlation coefficient, r = -0.02; please see Author response image 1), ruling out dead cell transcriptome interference.

      (3) Secondary/indirect effects (addressing Reviewers 1 & 2): Upregulated genes likely result from indirect regulatory cascades. H2AZ depletion may disrupt upstream transcription factors, leading to compensatory upregulation of their downstream genes or cell stress responses to meiotic arrest. Notably, Znhit1 knockout specifically impacts genes upregulated at the zygotene-pachytene transition, while genes upregulated at preleptotene-leptotene or leptotene-zygotene transitions remain largely unaffected (revised Fig. 4B), confirming the specificity of H2A.Z’s direct regulatory role and framing upregulation as non-targeted indirect effects.

      (3) The authors should also test the effect of Znhit1KO on the 1196 genes (up PreL/L) and 1325 (up L/Z) as shown in Figure 5D for the PGA. Also in Figure 5B, there is no evaluation of the statistical significance of the variation, this should be revised. X and Y genes should be analysed. KAS-Seq should be correlated with gene expression analysis, and several points as mentioned in the reviews below should be better explained and discussed.

      (1) Effect of Znhit1-KO on PreL/L- and L/Z-upregulated genes: we analyzed the 1196 genes upregulated at the PreL/L transition and 1325 genes upregulated at the L/Z transition. Znhit1 knockout had minimal effect on the expression of these early meiotic gene sets (revised Fig. 4B), whereas genes activated at the zygotene‑pachytene transition were strongly downregulated in Znhit1-KO spermatocytes. These results confirm the specific role of ZNHIT1 in regulating pachytene‑stage gene expression. We have also added a statistical evaluation for the variation shown in Fig. 4B.

      (2) X/Y-linked gene analysis: Analysis of stage‑resolved scRNA‑seq revealed aberrant ectopic activation of 120 XY‑linked genes at zygotene and 119 at pachytene in Znhit1-KO spermatocytes (revised Fig. 4F), demonstrating impaired Meiotic Sex Chromosome Inactivation (MSCI).

      (3) KAS-seq correlation with gene expression: We analyzed the link between KAS‑seq signals and gene expression, and we found that Znhit1 depletion caused a global reduction in KAS‑seq signals, especially at promoters of downregulated genes (revised Fig. S8). Genes with increased expression showed low KAS‑seq signals in both control and mutant groups, likely reflecting indirect regulation. These results highlight the essential role of ZNHIT1 in transcriptional regulation.

      (4) The title should refer to Znhit1, and the effect on meiotic recombination activities may be an indirect consequence of prophase progression arrest, even if some recombination genes are downregulated. This point is important as noted by reviewer 3.

      We fully acknowledge Reviewer 3’s key point and have revised the manuscript title to “ZNHIT1-dependent H2A.Z deposition at meiotic prophase I underlies pachytene gene expression and meiotic progression during male meiosis” to reduce emphasis on a direct H2A.Z-recombination link.

      Regarding meiotic recombination activities: The downregulation of recombinationrelated genes (e.g., Ccnb1ip1, Rnf212) stems from impaired pachytene-stage transcriptional programs caused by ZNHIT1-dependent H2A.Z deposition defects, which in turn leads to prophase progression arrest. Thus, the observed recombination abnormalities may be a secondary consequence of the meiotic prophase arrest, rather than a direct regulatory effect of ZNHIT1 on recombination machinery. This clarification has been integrated into the Discussion section (lines 314-318).

      (5) The recent structural analysis of SRCAP should be cited: Yu et al. Cell Discovery (2024) 10:15 https://doi.org/10.1038/s41421-023-00640-1.

      We have cited this reference in this revised manuscript (lines 234-236).

      (6) The authors should read and answer the specific revisions asked for by the reviewers.

      We have thoroughly read and systematically addressed all specific revisions requested by Reviewers 1, 2, and 3, as detailed in the revised manuscript and supplementary data.

      References

      Alexander, A.K., Rice, E.J., Lujic, J., Simon, L.E., Tanis, S., Barshad, G., Zhu, L., Lama, J., Cohen, P.E., and Danko, C.G. (2023). A-MYB and BRDT-dependent RNA Polymerase II pause release orchestrates transcriptional regulation in mammalian meiosis. Nature communications 14.

      Cole, L., Kurscheid, S., Nekrasov, M., Domaschenz, R., Vera, D.L., Dennis, J.H., and Tremethick, D.J. (2021). Multiple roles of H2A.Z in regulating promoter chromatin architecture in human cells. Nature communications 12, 2524.

      Ernst, C., Eling, N., Martinez-Jimenez, C.P., Marioni, J.C., and Odom, D.T. (2019). Staged developmental mapping and X chromosome transcriptional dynamics during mouse spermatogenesis. Nature communications 10, 1251.

      Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., et al. (2010). Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182-187.

      Sporrij, A., Choudhuri, A., Prasad, M., Muhire, B., Fast, E.M., Manning, M.E., Weiss, J.D., Koh, M., Yang, S., Kingston, R.E., et al. (2023). PGE(2) alters chromatin through H2A.Z-variant enhancer nucleosome modification to promote hematopoietic stem cell fate. Proceedings of the National Academy of Sciences of the United States of America 120, e2220613120.

      Turner, J.M. (2015). Meiotic Silencing in Mammals. Annu Rev Genet 49, 395-412. Wu, T., Lyu, R., You, Q., and He, C. (2020). Kethoxal-assisted single-stranded DNA sequencing captures global transcription dynamics and enhancer activity in situ.

      Nature methods 17, 515-523.

    1. eLife Assessment

      This potentially useful manuscript addresses the 3D chromatin architecture in monocytes from a few patients with alcohol-associated hepatitis and its relationship to enhanced transcription of innate immune genes. While the concept and methodological approach are interesting in principle, the evidence is incomplete as a result of inadequate sample sizes as well as other substantive analytical concerns.

    2. Reviewer #3 (Public review):

      In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.

      Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.

      In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.

      (1) There is a myriad of literature that describes the existence of cell-type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both the healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles, for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.

      I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.

      (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs) and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.

      Comments on revisions:

      In the revision the authors did not respond to my concerns which I believe still remain valid and compromise the author's conclusions of AH-specific effects on genome architecture.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors investigate the relationship between 3D chromatin architecture and innate immune gene regulation in monocytes from patients with alcohol-associated hepatitis (AH). Using Hi-C technology, they attempt to identify structural changes in the genome that correlate with altered gene expression. Their central claim is that genome restructuring contributes to the hyper-inflammatory phenotype associated with AH.

      Strengths:

      (1) The manuscript employs Hi-C technology, which, in principle, is a powerful approach for studying genome organization.

      (2) The focus on disease-relevant genes, particularly innate immune loci, provides a contextually important angle for understanding AH.

      Weaknesses:

      (1) Sample Size: The study relies on an exceptionally small cohort (4 AH patients and 4 healthy controls), rendering the results statistically underpowered and highly susceptible to variability.

      (2) Hi-C Resolution unpaired to RNA seq: The data are presented at a resolution of 100kb, which is insufficient to uncover meaningful chromatin interactions at the level of individual genes. This data is unpaired.

      (3) Functional Validation: The manuscript lacks experiments to directly link changes in chromatin architecture with gene expression or monocyte function, leaving the claims speculative.

      (4) Data Integration: The lack of Hi-C with ATAC and RNA-seq data handicaps the analysis and really makes it superficial. In short, it does not convincingly demonstrate a functional relationship.

      (5) Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      Appraisal of the Aims and Results:

      The manuscript sets out to establish a connection between chromatin architecture and AH pathology. However, the study fails to achieve its stated aims due to inadequate methods and insufficient data. The conclusions drawn from the Hi-C analyses alone are poorly supported, and the lack of functional validation undermines the credibility of the proposed mechanisms. Overall, the results do not provide compelling evidence to substantiate the authors' claims.

      Impact on the Field and Utility to the Community:

      The work, in its current form, is unlikely to have a meaningful impact on the field. The limited scope, methodological shortcomings, and lack of robust data significantly diminish its potential utility. Without addressing these critical gaps, the study does not offer new insights into the role of genome architecture in AH or provide useful methodologies or datasets for the community.

      Additional Context:

      The manuscript would benefit from a more comprehensive analysis of potential mechanisms underlying the observed changes, including the interplay between chromatin architecture and epigenetic modifications. Furthermore, longitudinal studies or therapeutic interventions could provide insights into the dynamic aspects of genome restructuring in AH. These considerations are entirely absent from the current study.

      Conclusion:

      The manuscript does not achieve its stated goals and does not present sufficient evidence to support its conclusions. The limitations in sample size, resolution, and experimental rigor severely hinder its contribution to the field. Addressing these fundamental flaws will be essential for the work to be considered a meaningful addition to the literature.

      Reviewer #2 (Public review):

      Summary:

      Dr. Adam Kim and collaborators study the changes in chromatin structure in monocytes obtained from alcohol-associated hepatitis (AH) when compared to healthy controls (HC). Through the usage of high throughput chromatin conformation capture technology (Hi-C), they collected data on contact frequencies between both contiguous and distal DNA windows (100 kB each); mainly within the same chromosome. From the analyses of those data in the two cohorts under analysis, authors describe frequent pairs of regions subject to significant changes in contact frequency across cohorts. Their accumulation onto specific regions of the genome -referred to as hotspots- motivated authors to narrow down their analyses to these disease-associated regions, in many of which, authors claim, a number of key innate immune genes can be found. Ultimately, the authors try to draw a link between the changes observed in chromatin architecture in some of these hotspots and the differential co-expression of the genes lying within those regions, as ascertained in previous single-cell transcriptomic analyses.

      Strengths:

      The main strength of this paper lies in the generation of Hi-C data from patients, a valuable asset that, as the authors emphasize, offers critical insights into the role of chromatin architecture dysregulation in the pathogenesis of alcohol-associated hepatitis (AH). If confirmed, the reported findings have the potential to highlight an important, yet overlooked, aspect of cellular dysregulation-chromatin conformation changes - not only in AH but potentially in other immune-related conditions with a component of pathological inflammation.

      Weaknesses:

      In what I regard as the two most important weaknesses of the work, I feel that they are more methodological than conceptual. The first of these issues concerns the perhaps insufficient level of description provided on the definition of some key types of genomic regions, such as topologically associated domains, DNA hotspots, or even DNA loci showing significant changes in contact frequency between AH and HC. In spite of the importance of these concepts in the paper, no operational, explicit description of how are they defined, from a statistical point of view, is provided in the current version of the manuscript.

      Without these definitions, some of the claims that authors make in their work become hard to sustain. Some examples are the claim that randomizing samples does not lead to significant differences between cohorts; the claim that most of the changes in contact frequency happen locally; or the claim that most changes do not alter the structure of TADs, but appear either within, or between TADs. In my viewpoint, specific descriptions and implementation of proper tests to check these hypotheses and back up the mentioned specific claims, along with the inclusion of explicit results on these matters, would contribute very significantly to strengthening the overall message of the paper.

      The second notable weakness of the study pertains to the characterization of the changes observed around immune genes in relation to genome-wide expectations. Although the authors suggest that certain hotspots contain a high number of immune-related genes, no enrichment analysis is provided to verify whether these regions indeed harbor a higher concentration of such genes compared to other genomic areas. It would be important for readers to be promptly informed if no such enrichment is observed, for in that case, the presence of some immune genes within these hotspots would carry more limited implications.

      Additionally, the criteria used to define a hotspot are not clearly outlined, making it difficult to assess whether the changes in contact frequencies around the immune genes highlighted in figures 5-8 are truly more pronounced than what would be expected genome-wide.

      Reviewer #3 (Public review):

      In this manuscript, the authors use HiC to study the 3D genome of CD14+ CD16+ monocytes from the blood of healthy and those from patients with Alcohol-associated Hepatitis.

      Overall, the authors perform a cursory analysis of the HiC data and conclude that there are a large number of changes in 3D genome architecture between healthy and AH patient monocytes. They highlight some specific examples that are linked to changes in gene expression. The analysis is of such a preliminary nature that I would usually expect to see the data from all figures in just one or two figures.

      In addition, I have a number of concerns regarding the experimental design and the depth of the analyses performed that I think must be addressed.

      (1) There is a myriad of literature that describes the existence of cell type-specific 3D genome architecture. In this manuscript, there is an assumption by the authors that the CD14+ CD16+ monocytes represent the same population from both healthy and diseased patients. Therefore, the authors conclude that the differences they see in the HiC data are due to disease-related changes in the equivalent cell types. However, I am concerned that the AH patient monocytes may have differentiated due to their environment so that they are in fact akin to a different cell type and the 3D genome changes they describe reflect this. This is supported by published articles for example: Dhanda et al., Intermediate Monocytes in Acute Alcoholic Hepatitis Are Functionally Activated and Induce IL-17 Expression in CD4+ T Cells. J Immunol (2019) 203 (12): 3190-3198, in which they show an increased frequency of CD14+ CD16+ intermediate monocytes in AH patients that are functionally distinct.

      I suggest that if the authors would like to study the specific effects of AH on 3D genome architecture then they should carefully FACsort the equivalent monocyte populations from the healthy and AH patients.

      (2) The analysis of the HiC data is quite preliminary. In the 3D genome field, it is usual to report the different scales of genome architecture, for example, compartments, topologically associated domains (TADs), and loops. I think that reporting this information and how it changes in AH patients in the appropriate cell types would be of great interest to the field.

      We thank the reviewers for their careful and thorough examination of our manuscript. We agree with all of their comments regarding the limitations of the study. Many of the criticisms focus on the small sample size of our study (n=4 for healthy controls and disease patients) in both Hi-C and single-cell RNA-seq experiments, and that these experiments are unpaired, or in other words, PBMCs came from different patients for each experiment.

      Unfortunately, these experiments are fairly complicated to perform, requiring patient cells and very expensive deep sequencing. We are not currently in a position to be able to easily or cost effectively increase sample size. In the case of Hi-C, we still believe our study to be of value as Hi-C is not a commonly used technique to study disease effects on chromatin, and very few studies have employed a large enough sample size to perform statistical comparisons. Additionally, to analyze the data at a higher resolution would require deeper sequencing, and unfortunately we do not have the resources to sequence these libraries deeper. Regarding the single-cell RNA-seq data, this dataset was generated for an earlier study [1] focusing on gene expression responses to LPS, and we were unable to get PBMCs from exactly the same patients to perform the Hi-C study.

      We disagree that our study has limited scientific value. Our study is the first to use Hi-C to show that the 3D genome architecture of primary monocytes is changed in a disease context. The only other study to follow a similar approach performed Hi-C in monocytes from 2 healthy and 2 Systemic lupus erythematosus (SLE) patients, and in their study the data from both patients were combined prior to comparison. No statistics were performed and their conclusion was no differences in genome architecture due to disease. They did find differences between primary monocytes and the THP1 monocytic cell line, but this lacked statistical analysis. Their conclusion was that inflammatory disease may not lead to genome wide changes in architecture. Our study, though a very different disease than SLE, shows statistically significant differences between AH and healthy controls. We believe our study lays the groundwork for how Hi-C can be used to study genome architecture in human disease, and the possible downstream effects.

      Confounding Factors: The manuscript neglects critical confounding variables such as comorbidities, medications, and lifestyle factors, which could influence chromatin structure and gene expression independently of AH.

      This is an interesting suggestion. This dataset only contains 4 AH patients, which we have included basic clinical data in Supplemental Table 1, including Age, HCA1c, Bilirubin, AST, ALT, Creatinine, Albumin, and MELD score. 3/4 of these patients are severe AH while 1 is moderate (AH2). Despite one patient being moderate, all four AH patients had similar correlations with each other, suggesting these disease specific differences we observed are not indicative of severity. More patient samples are needed to determine if genome architecture changes throughout disease progression. We have added this important discussion to the manuscript (page 12, lines 5-14).

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      The criteria used to determine which pairs of regions exhibit significant differences in contact frequency between alcohol-associated hepatitis (AH) and healthy controls (HC) are not disclosed. It would be beneficial for the authors to provide this information, including details such as the number of pairs tested, the nature of the statistical tests conducted, the method of multiple testing correction applied, as well as the significance thresholds used, and the number of loci-pairs below these thresholds for each chromosome. This information would greatly enhance the reader's understanding of the relevance of the reported findings.

      Thank you for this comment, though we are not sure we totally understand. All of our statistics were performed using multiHiCcompare [2], where we input all 8 datasets (.hic files from Juicer), then measured statistical differences between defined groups (HC vs AH). For our randomization studies, we randomized the group comparisons, so each group contained a mix of HC and AH.

      Second, a formal statistical definition of what constitutes a hotspot would be valuable for clarity.

      Thank you for this suggestion. Initially, hotspots were defined as just regions of the genome with a high frequency of very significant differential contacts. We have defined a more formal definition of “hotspot” based on similar criteria. A hotspot is defined by both adjusted p value and frequency of locations. First, we filtered all pair-wise chromosomal interactions by a very, very stringent padj < 0.0000001 to focus on only the most changed coordinates (Supplemental Table 4). Then we looked for regions of the genome with a high frequency of these differential locations. Borders for each hotspot were determined more liberally by looking at the full list of differential spots (padj < 0.05). Then we used code to list genes within each interacting region. We have added these important details to the Methods (page 14, lines 11-14).

      Third, a clear definition of the criteria used to identify different topologically associated domains (if these were indeed defined in the data and/or utilized in the analyses) would also be a helpful addition.

      Thank you for this suggestion, we did not identify TADs or really utilize TADs in any of these analyses.

      Likewise, several statements throughout the paper lack support from specific analyses, although it should be feasible to implement such analyses (or at least present them if they have already been conducted) to substantiate these claims:

      If randomizing samples does not result in significant differences between (randomized) cohorts, it would be beneficial to provide insights into the number of loci pairs that exhibit differences in frequency when using both the actual and randomized cohorts.

      Thank you for asking this question, as this is an important point. Using multiHiCcompare, if we compare WT (n=4) to AH (n=4), we get the results in the figures and supplementary data but if we randomize Group 1 (WT, WT, AH, AH) vs Group 2 (WT, WT, AH, AH), we get almost 0 significant changes in contact frequency. To show this more robustly, we performed 5 randomized comparisons and found far fewer changes in contact frequency between groups. This shows that these changes in contact frequency caused by disease are not random, but rather due to our real difference in AH. This point has been added to the Results (page 6, lines 15-17), and Methods (page 14, lines 16-21)

      If most changes in contact frequency occur locally, it would be useful to visualize the relationship between effect sizes and/or significance levels for the observed differences in frequency in relation to the distance between the involved loci. Additionally, comparing these results to the average baseline contact intensities as a function of distance would be informative. This comparison could help determine whether the distance decay in effect size/significance for the differences between AH and HC is faster or slower than the decay rates for baseline contact frequencies.

      This is a good suggestion. In our initial analysis, we made a number of figures relating chromosome positions, distance between loci, and statistics regarding the differential contact frequency. In the initial submission, we only showed Figure 3, which shows the logFC (log fold change) for the differential contact frequency by chromosomal position on both sides. To address this question, we have added a supplemental figure showing logFC as a function of the distance between two loci (new Supplemental Figure 3)

      Similarly, the assertion that most changes do not affect the structure of topologically associated domains (TADs) but occur either within or between TADs should be supported by specific testing; otherwise, or else, removed.

      Thank you, yes we have adjusted the language in the Discussion

      Furthermore, the authors should clarify whether differences in chromatin conformation are more pronounced around immune genes compared to genome-wide expectations. If this is not the case, it would be helpful to quantify the intensity of these differences around the highlighted genes in relation to the rest of the genome. To achieve this, I would suggest the following:

      Conduct enrichment analyses on the genes located within the most prominent hotspots to determine whether they are significantly enriched in immune genes (and, or, alternatively, in any other functional category).

      Estimate the average absolute fold change in contact frequency within all topologically associated domains (TADs) identified in the study. This would allow for the identification of immune gene-containing TADs highlighted in Figures 5-8, providing readers with a quantitative understanding of how anomalously different these genomic regions are with regards to the magnitude of its alterations in AH, compared to the rest of the genome.

      While some of the selected gene clusters appear to co-localize well with topologically associated domains (e.g., Figures 5A, 8A), others seemingly encompass either multiple TADs (Figure 6) or only portions of them (Figure 7). This should be clarified.

      Thank you, this is a great suggestion. In order to be as unbiased as possible, we took all genes present in the regions with the highest significant changes in genome (Supplemental Table 4) that we used to identify the hotspots. And you are correct, we do in fact see enrichment of genes involved in innate immune signaling. This has been added to Results (page 7, lines 19-25) and Figure 4.

      Finally, there are several minor issues concerning the figures that could be easily addressed to substantially enhance their readability:

      Font sizes in most figures should be increased, particularly for some axis labels and tick marks. This issue affects most figures; for instance, in Figure 4, it hinders the reader's ability to interpret the ranges of the data presented.

      Thank you, the figures have been adjusted

      Figures 5 to 8 (panels A and B) would benefit significantly from a more consistent format. Specifically, the gene cluster boxes should also be included in the right panels, and the gene locations should be displayed on the left in a uniform format across all figures (e.g., formatting Figures 7 and 8 to match the style of Figures 5 and 6).

      Figures 5 and 6 have a similar structure to each other because we were focusing on all of the genes in that chromosomal region. Figures 7 and 8 are different because we are focusing on how the region around a certain hotspot of interest changes.

      It is also important to note that the genes plotted in Figures 8C and 8D are not the same. Concerning these two panels, it would be valuable to clarify whether the data presented pertains exclusively to monocytes. If so, information regarding the number of cells analyzed and the number of donors from which they were drawn would also be beneficial.

      These figures are generated using scRNA-seq data. They represent all of the genes expressed in that region of the genome, in their chromosomal position. If a gene is not expressed in the scRNA-seq data, then it is not shown. I have debated with myself a lot on how to show gene expression in a region of the genome, but I think this is the clearest way to show this; including the genes that have no expression would make it more confusing. But yes, if you compare HC and AH, you see some differences in the list of genes. We have added more clarity to the figure legend for this figure.

      References

      (1) Kim, A., Bellar, A., McMullen, M. R., Li, X. & Nagy, L. E. Functionally Diverse Inflammatory Responses in Peripheral and Liver Monocytes in Alcohol-Associated Hepatitis. Hepatol Commun 4, 1459-1476 (2020). https://doi.org:10.1002/hep4.1563

      (2) Stansfield, J. C., Cresswell, K. G. & Dozmorov, M. G. multiHiCcompare: joint normalization and comparative analysis of complex Hi-C experiments. Bioinformatics 35, 2916-2923 (2019). https://doi.org:10.1093/bioinformatics/btz048

    1. eLife Assessment

      This valuable study provides quantitative data and analysis to reveal that variations in Dorsal (Dl ) nuclear dynamics along the Dorso-ventral axis in the early Drosophila embryo are governed by Dl-Cactus nuclear interactions. The solid evidence partially supports a mechanism where nuclear localized Cactus contributes to the fraction of Dl that binds to DNA, but additional work will be necessary to confirm the claims and the biological significance of these findings.

    2. Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues here apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair correlation function spectroscopy) to address the nucleo-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Toll-dependent control of Dl nuclear localization, and represents one of a handful of model morphogen gradients produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measurement of GFP-tagged Dl protein, either in wild-type embryos, or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The study uses raster image correlation spectroscopy approaches to measure biophysical components of the Dl gradient in Drosophila embryos. It convincingly demonstrates a positive correlation between Toll pathway activity and the fraction of bound Dl in the nucleus. RICS methodology has widespread potential applications in cell and developmental biology, and this study will contribute to its adoption.

      Weaknesses:

      The study seeks to test a hypothesis for how the Toll pathway may limit Dl DNA binding in the nucleus. This experiment, while producing initial support for a role of nuclear Cactus, is confounded by co-expression of wild-type Dl, thus limiting the interpretation of the experimental results.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al. use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying Dl gradient formation have been extensively studied, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. But the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors separated GFP-tagged Dl into a mobile and an immobile pools. Interestingly, the fraction of immobile Dl is position-dependent, revealing more binding to DNA in ventral than in dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl to bind DNA. Using specific dl alleles, authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      The main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl to bind DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A). While it is interesting that a small fraction of immobile Dl significantly increases in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent.

      Another weakness of the study, is that experiments are performed in the presence of a wild-type GFP-tagged Dl (unfortunately, the Dl gradient does not form without it; Supplemental Figure 4). This is an unfortunate technical limitation, because it cannot allow to test how important Cact binding is for determining the amount of Dl that could bind DNA in more biologically-relevant locations of the embryo (e.g., in lateral regions).

      Overall, I feel that the manuscript exemplify how FSC methods and analysis can be used for the estimation of biophysical parameters and test biological hypothesis, even under very low concentrations (such as Dl in dorsal-most nuclei). However, due to technical limitations, it falls short in offering a real quantitative understanding of their proposed mechanisms. The authors did not report in Figure 5, what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Al Asafen and colleagues apply a set of scanning fluorescence correlation spectroscopic approaches (Raster Image Correlation Spectroscopy (RICS), cross-correlation RICS, and pair-correlation function spectroscopy) to address the nuclear-cytoplasmic kinetics of the Dorsal (Dl) transcription factor in early Drosophila embryos. The Toll/Dl system has long been appreciated to establish dorsal-ventral polarity of the embryo through Tolldependent control of Dl nuclear localization, and provides an example of a morphogen gradient produced with high enough precision to yield robust biophysical measurements of general transcription factor activity and function. By measuring GFP-tagged Dl protein, either in wild-type embryos or in mutant embryos with low/medium/high levels of Toll signaling, the authors report diffusivity of Dl in nuclear and cytoplasmic compartments of the embryo, as well as the fraction of mobile and immobile Dl, which can be correlated with DNA binding through cross-correlation RICS. A model is presented where Cactus/IkB is implicated in preventing Dl from binding to DNA.

      Strengths:

      The experiments on wild-type GFP-tagged Dorsal are performed well, are mostly reported well, and are interpreted fairly.

      Weaknesses:

      The discrepancy between experiment and theory as pertains to Michaelis-Menten kinetics is not fully motivated in the text, and could benefit from a more clear presentation. The experiments performed to distinguish between the contribution of Toll-dependent phosphorylation and Cactus interaction models for limiting Dorsal DNA binding are possibly confounded by the presence of wild-type, GFP-tagged Dorsal protein.

      Thank you for your thoughtful feedback. Regarding the discrepancy between experiment and theory in relation to Michaelis-Menten kinetics, we recognize that our initial explanation may not have been explicit enough. Our intent was to illustrate that if DNA binding is a saturable process, then while the absolute concentration of Dl bound to DNA will increase with total Dl levels, the fraction of Dl bound to DNA will decrease. We used Michaelis-Menten kinetics only as a familiar example to convey this concept but did not intend to suggest that the system strictly follows Michaelis-Menten behavior. To clarify this point, we removed mention of Michaelis-Menten as an illustrative analogy and stuck specifically with discussing the system as “saturating.” This primarily affected text in the paragraph starting on Line 204, but also Lines 323-325.

      Regarding the concern about potential confounding effects due to the presence of wildtype GFP-tagged Dorsal (Dl[wt]-GFP): we understand the importance of addressing this point more directly. Therefore, we have imaged the Dorsal-GFP gradient in embryos expressing the UAS-dl[S280P]-GFP or the UAS-dl[S317A]-GFP constructs in the absence of the BAC-recombineered Dl-GFP construct. In both cases, the dl mutants by themselves were not able to recapitulate enough of the Dl gradient to test our hypotheses. We have added this analysis to Supplemental Figure 4 and mentioned this figure on Lines 333-336 and 354-358. Furthermore, we explicitly mention that it is possible the reason why we failed to reject the null hypothesis in the Toll phosphorylation mutant case may be due to the additional copy of Dl[wt]-GFP (the BAC recombineered construct), with text added to Lines 343-345, 365-369 (Results) and 408-418 (Discussion).

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Al Asafen, Clark et al., use fluorescence correlation spectroscopy (FCS) to quantitatively analyze the mobility of Dl along the DV axis of the early Drosophila embryo. Dl is essential for dorsal-ventral (DV) patterning and its gradient initiates the activation of several genes and thereby orchestrates the formation of the Drosophila body plan. While the mechanisms underlying the formation of the Dl gradient have been extensively studied by this group and others, there are some observations for which there is not yet a mechanistic explanation. For example, the peak of the Dl gradient grows continuously during nuclear cycles 10-14. This is likely due to Cact-dependent Dl diffusion and Dl binding to DNA. However, the biophysical parameters governing Dl nuclear dynamics that would support these claims have not been previously measured. In this work, the authors provide evidence that GFP-tagged Dl may be separated into a mobile pool and an immobile pool. Interestingly, the fraction of immobile Dl is position-dependent along the DV axis, revealing more binding to DNA in the ventral than in the dorsal nuclei. This is either due to higher binding affinity in ventral locations (due to Toll-dependent Dl phosphorylation) or to higher Dl-Cact binding in dorsal nuclei that would prevent Dl from binding to DNA. Using dl-mutant alleles, the authors support the latter hypothesis.

      Strengths:

      The manuscript is well written and their conclusions are convincingly supported by their methodology and analysis. As a quantitative study, the biophysical analysis seems rigorous, in general.

      Although this is not the first study that employs FSC to investigate the dynamics of a morphogen, it further exemplifies how these quantitative tools can be used to uncover mechanistic aspects of morphogen dynamics during development. In particular, the manuscript reports novel biophysical parameters of Dl dynamics that will be helpful in future hypotheses-driven modeling studies.

      Weaknesses:

      In my opinion, the main weakness of the manuscript is that the main biological implication of the study, namely that the asymmetry in the fraction of immobile Dl is a result of nuclear Dl-Cact binding which prevents Dl from binding DNA (Figure 5), occurs in a region of the embryo where there is very little Dl anyways (Figure 1A, 5A). While it is interesting that the fraction of immobile Dl increases (just a little, but significantly) in dorsal nuclei in mutants expressing a form of Dl with reduced Cact binding it is unclear what is the biological impact of this effect in a location where Dl is nearly absent. As can be seen in Figure 3F, the fraction of immobile is unaffected in Dl-mutant forms with reduced DNA binding, because it is already very low. It is unlikely that Dl binding to Cact in dorsal nuclei would affect shuttling as well since the fraction is very low anyway.

      We thank the reviewer for pointing out the places where we could strengthen our explanations. Here we first address the criticism, also raised by the other reviewer, that the fraction of immobile Dl increases only a small amount (Fig. 5A). [In our reply to the next comment, we address the question of biological implications.] We attempted to explain this small effect size in the manuscript; however, we understand that we could clarify further and, given the fact that eLife has no restraints on space, we added more explanation in the main text.

      In essence, even though the effect was statistically significant, the effect size was small because the mutation was “diluted” by the presence of a wildtype Dl protein tagged with GFP. We were willing to deal with this dilution because the alternative was that, according to previous literature, without any wildtype Dl, no Dl gradient would be present in the reduced Toll phosphorylation mutants, and only a very weak Dl gradient (weakened on both ends) would be present in mutants that reduced Cact binding. We were confident that, with our quantitative approaches, we would be able to detect the diluted effect.

      However, because both reviewers have criticized this diluted effect, in this resubmission, we have included analysis of GFP-tagged mutants without the presence of wildtype Dl protein. Unfortunately, these embryos lack a discernible Dl gradient and cannot be analyzed in such a way as to test the hypotheses that the mutants were generated for.

      Even so, the effect of the Cact-binding mutant was strong enough that we were able to statistically distinguish it from embryos expressing only wildtype Dl-GFP, even with the dilution effect. On the other hand we have also included a caveat that our failure to statistically distinguish Toll phosphorylation mutants from wildtype may be due to the dilution effect. We now also explicitly state the concerns about a lack of a discernible Dl gradient and have included figures of full mutants in the supplement. See also our discussion of Reviewer 1’s similar comment.

      While the authors have a very clear understanding of the biology of the Dl gradient, I feel that the manuscript is more written as a 'tools' paper (i.e., to exemplify how FSC methods and analysis can be used for biological discovery). This is ok, but I think that the authors should discuss further what are the biological implications of these findings other than the contribution to uncovering the biophysical parameters.

      Here we underscore the biological implications of our discovery that Cact is present in the nucleus on the dorsal side. The reviewer mentioned that Cact in the nucleus on the dorsal side appears to have little overall effect, because this is the location of the embryo where there is very little Dl in the first place, which raises the question of whether this discovery is impactful.

      While we previously used the final paragraph of the discussion to touch on the implications of this discovery, we acknowledge that we could have spent more time on the explanation. As such, we have expanded this final paragraph into two paragraphs. In the first of the two, we discuss in more detail the implications specifically of the Dl/Cact interactions in the dorsal-most nuclei, as understood by the results of this paper. In brief, knowing that Dl in the dorsal-most nuclei is bound by Cact results in an updated understanding of the Dl gradient, with increased dynamic range, robustness, and precision (but unknown shape).

      In the second of the two paragraphs, we discuss this result in light of our recent work on imaging Cact in live embryos, in which we have shown that Cact is present in all nuclei at roughly uniform levels. Taken together, we suggest that it is possible that Cact is bound to Dl in all nuclei (not just the dorsal-most), which would allow us to estimate the shape of the overall Dl gradient by subtracting off the fluorescence that stems from Dl/Cact complex.

      For example, I think that the implications of the rejected hypothesis (i.e., that Tolldependent Dl phosphorylation does not seem to have an impact on Dl binding affinities to DNA) are important and should be further discussed (even if no additional experiments are performed). What is then the role of Dl phosphorylation? Perhaps it could have an impact on patterning robustness in lateral regions. The authors should report in Figure 5 also what happens to the fraction of Dl bound to DNA in lateral regions in the reduced Cact binding and reduced Toll phosphorylation mutants.

      We appreciate the reviewer’s suggestion that the rejection of the hypothesis that phosphorylation of Dl by Toll impacts Dl/DNA binding could be expanded upon further. For the role of Dl phosphorylation by Toll: we previously mentioned that this phosphorylation is known to enhance the nuclear import or retention of Dl, and that mutation of serine 317 to an alanine abolishes Toll-mediated phosphorylation of Dl, which results in embryos with no Dl gradient. We had also mentioned that phosphorylation of Dl is not known to affect its DNA binding, which is the hypothesis we sought to test by creating the dl[S317A]-GFP mutants. We did not image any mutants, or the UAS-dl[wt]-GFP control, in the lateral regions, for two reasons. First, this region is easily the smallest of the three regions, in terms of the percentage of the DV axis (see Fig. 1A). Second, because of the dilution effect, we knew the effect size would be small, and as such, we imaged only on the extreme ends of the gradient so that the most clear conclusion could be drawn about the effect that Toll phosphorylation might have on DNA binding of Dl.

      The way that position along the DV axis is reported using the nuclear-cytoplasmic-ratio (NCR) in Figures 1-3 is not incorrect, but I wonder if it is the best way of doing it. The reason is that it spreads out a relatively small region of the embryo (the ventral-most locations) and shrinks a relatively large region of the embryo (lateral and dorsal regions), see Figure 1A. Perhaps reporting the NCR in log_2 units would be more appropriate.

      We agree that there is some distortion of the relative spatial extents of the Dorsal gradient when NCR is used as an independent variable on a plot. However, we prefer the NCR on the horizontal axis because it is closer the functional variable (Dl concentration, rather than spatial location) for the properties we studied.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      I really enjoyed the first part of this paper and have only minor suggestions for improvement of the presentation. I am confused about the experimental approach for the final figure, distinguishing phosphorylation and cactus-dependent effects. I'll divide my comments between "First Part/General Suggestions", "Last Part", and finish with some minor typo observations.

      The gist of the issues with the last part of the paper could boil down to insufficient detail/explanation of the section. The discrepancy with expectation with Michaelis-Menten kinetics is presented in a total of three sentences and is not necessarily obvious to the general readership of eLife. The mutants chosen to distinguish the phosphorylation and cactus mechanisms could be described more (why these? aren't other residues phosphorylated?) and possibly why also having wild-type GFP-Dl in the measurements isn't confounding. Since there is unlimited space in this journal, it may be advisable to use this space to fill out these rationales and ideas.

      First part/General Suggestions:

      (1) For the RICS data, (Figures 1 and 2) there is a nice correlation between WT NC ratio and the selected low/med/hi Dl activity mutants. More-or-less the median values in, say, Figure 1E-G are reflected in Figure 1H. However, with the ccRICS data (Figure 3), it looks like there is less correspondence between the range of fraction bound estimates in, for instance, "ventral" in Figure 3D and '10b' in Figure 3E. Can the authors comment on this? Should the reader be able to make this kind of comparison, or does something about data collection for the wt/NCR measurements preclude direct comparison of magnitudes with the panel of mutants? (imaging setup, laser power, etc)?

      The reviewer is correct that there seems to be a discrepancy in the values of ψ between the wt embryos (ventral side) and the Toll10B embryos. It should be noted that the Toll10B embryos are not “ventral-like” in every way, in part because they have unknown activated Toll levels that might be above or below what is seen at the ventral midline in wildtype embryos, and in part because there is no DV gradient, and thus no shuttling in these embryos that would accumulate total Dorsal on the ventral midline. As such, comparisons between Toll10B embryos and the ventral side of wildtype embryos are not exactly one-toone, and we are more confident in comparing among the mutants in an allelic series. To address this question, we have added a sentence to the end of the second paragraph of the “Dorsal/DNA binding exhibits a spatial gradient” subsection of the Results (Lines 233235).

      (2) Materials and methods: Mounting and imaging of Drosophila embryos: the authors cite the "488 nm laser intensity ranged from 0.5% to 3.0%..." The values presented here are not useful for the general reader or an individual looking to replicate these conditions, as emission power produced from such values will vary from instrument to instrument. It is standard in these cases to report an estimated laser power (measured in watts) for each laser line, and a clear description of how such measurements were made (stationary beam, under scanning conditions, with what detector, etc). These measurements are valuable and the authors are strongly encouraged to report such measurements for their setup.

      We appreciate the reviewer’s suggestion and understand the importance of providing absolute laser power values for reproducibility. We have now included the laser power (in watts) for the laser lines on both microscopes used in this study. The revised text can be found in the Materials and Methods section, in the Lines 535-536 and 540.

      (3) The presentation of the data in Figure 4 is difficult to understand. Are the kymographs (A lower) representing the entire length of the big white arrow in A upper? Or do the dashed lines indicate the x-axis limits of the kymograph? It is difficult to tell from the figure legend, where the dashed lines are described as "areas where Dl-GFP movement is measured out of the nucleus." I believe that the authors can make these measurements and that Figure 4B reflects properties of "movement" of Dl out of the nucleus, but how they get there from these data is not clear to this reader. Perhaps a cartoon explaining the green lines and the orange lines in the kymograph or tightening the legend would help.

      We thank the reviewer for their feedback and understand the need for greater clarity in the text of the pCF section and in Figure 4. The widths of the kymographs in the lower panels correspond to the full widths of the images in the upper panels. The pCF measurements were taken at the y-coordinates at the level of the white arrows. The dashed vertical lines connecting the upper and lower panels illustrate two cases of locations along the x-axis of the image where Dl is crossing from inside a nucleus to outside. In the two illustrated cases, these crossings are accompanied by either zero Dl molecules being observed to cross the nuclear barrier (ventral image/kymograph on left) or delayed crossing of Dl molecules (dorsal image/kymograph on right). To address this concern, we have added more detail to the Fig. 4 legend and greatly expanded on a discussion of what pCF does in the text (the second and third paragraph of the section). We have also updated Fig. 4 to align with new explanations from the text: namely, describing the y-axis of the kymographs as Δt (instead of log(time)) and explicitly showing that the pair correlation is for pairs of pixels that are Δx = 6 pixels apart. Further details were also added to the relevant Methods section.

      (4) DV position in the wild-type imaging experiments is operationally determined through measurement of the Dorsal NC ratio. This makes sense, but the strategy is buried in the first paragraph of the results, and not discussed in the M & M. For readers unfamiliar with imaging the fly embryo or the nuances of the Dl gradient, perhaps a sentence or two explaining that embryos were oriented randomly along the DV axis, and DV positions of the imaging region were estimated by measuring the Dl NC ratio.

      We thank the reviewer for this helpful suggestion. To improve clarity, we have added a description of how DV position was determined to the Materials & Methods section (paragraph starting on Line 520). Specifically, we now state that embryos were randomly oriented along the DV axis and that we used the Dorsal NC ratio of intensity as a proxy for measuring the DV position in imaging experiments. Additionally, we have added a statement to the Results section to ensure that this strategy is more clearly introduced (Lines 143-144). We appreciate this recommendation, as it will help readers unfamiliar with fly embryo imaging better understand our approach.

      (5) It would be nice to report the corresponding NC-ratio values for Dl in each of the mutant conditions, perhaps as a supplement to Figure 1. Currently, Figure 1H relies on the (admittedly well-established) properties of the three mutants, but it feels that an additional nice quantitative link in the data can be drawn out here. Do the authors see the strict correlation between the wt and mutant diffusivity measurements at specific NC-ratios?

      We are hesitant to try to draw direct comparisons between the mutants and the behavior of the wildtype embryo at the corresponding NCR. This is because, in the context of these uniform mutants, the NCR is determined by a combination of at least three factors that we cannot measure or control for: the unknown strength of Toll signaling, the unknown capacity of Toll signaling (ie, the potential saturation of the cytoplasmic enzymes controlled by Toll signaling), and, most importantly, the lack of a shuttling mechanism that concentrates Dl on the ventral side of the embryo. As such, the NCR does not represent a continuous variable that transforms the behavior of one mutant into another (or from mutants into wt DV coordinates), as it does along the DV axis in wildtype embryo. This is why the mutant studies are presented as boxplots. At best, we were comfortable only in using the uniform mutants as an allelic series to produce gross trends. We have added a brief statement describing the shuttling caveat to the Results section (Lines 173-177).

      (6) In the section related to Dl nuclear export, the language used to describe Dl kinetics is ambiguous. The term "movement" is used seemingly as a catch-all for nuclear-importexport as distinguished from diffusion. However, diffusion is also a form of movement. Could this section be reworked to explicitly distinguish nuclear import-export and diffusive movements?

      We appreciate the reviewer’s suggestion and agree that the language used to describe Dl kinetics could be more precise. By way of explanation, the pCF analysis calculates the time scale on which Dl can exit the nucleus. pCF only gives a signal if it sees the same Dl molecule twice, at two different locations after some Δt amount of time has passed. Because of this, if a given Dl molecule in a ventral nucleus is being tracked, then that molecule has some probability that it is bound to DNA initially, which means it will take, on average, longer to exit the nucleus than a Dl molecule not initially bound to DNA. Therefore, on the ventral side, the time scale on which Dl exits the nucleus is longer than on the dorsal side (where DNA binding is not happening). This can be true even if the nuclear export rate constants are the same on the ventral side vs the dorsal side. As such, we were careful to choose language that did not imply that we were talking about a nuclear export rate constant. We have added this discussion to the end of the relevant Results section (Lines 308-315).

      We have also revised this section to explicitly distinguish between the mobility associated with exiting the nucleus and diffusive movement, while still trying to distinguish between the time scale of exiting the nucleus vs the nuclear export rate. Specifically, we now refer to ‘time scale of nuclear export’ when discussing transport across the nuclear envelope and reserve the term ‘diffusion’ for passive intracellular movement. Furthermore, we have edited a sentence in this section (Lines 291-293) to describe the distinction we are making between the time scale measured by pCF and the time scale commonly associated with nuclear export (that is, the reciprocal of the rate constant). We hope this clarification improves readability and conceptual clarity.

      Last Part:

      (1) There is an undersold argument centered on Michaelis-Menten kinetics that needs to be explicitly presented, especially since it motivates the final experiments of the paper, which are challenging. In the two sections describing how the data do not adhere to expectations based on Michaelis-Menten Kinetics, the assertion that "the fraction of immoble Dl is expected to decrease with increasing nuclear total Dl concentration" is only intuitively true if the system is saturated. Is the system demonstrably saturated? Another interpretation of this would be that these results demonstrate that the system is likely not saturated. In any case, the authors need to devote some space in the introduction and/or results and/or discussion to fully motivate this point.

      We agree that the reviewer has raised an important point: if the system is very far from saturation, then the fraction of immobile Dl is not expected to decrease with increasing nuclear total Dl concentration. But neither would it increase; it would instead stay flat. To correct this mistake, we have edited the sentences in question to acknowledge the farfrom-saturation scenario, saying “at best, [the fraction bound] remain[s] constant” (Line 209). As such, our original point, which is that in no case would the fraction immobile increase [unless something else is going on besides affinity-based binding to DNA], it still valid.

      (2) Wouldn't any argument on the basis of Michaelis-Menten need to rely on the assumption that the system is at steady-state? Reeves 2012 concludes that during the times measured here, Dl does not reach a steady state. It would be good, in the context of the point above, for the authors to clarify how this impacts the expectations of saturation and the application of M/M kinetics.

      We thank the reviewer for raising this important point. We apologize for not being clear on our points about M/M kinetics and would like to stress again that we are not claiming the system is has M/M kinetics. We appealed to M/M kinetics only as a simple, intuitive example of a saturating system to point out the difference between bound concentration vs bound fraction as functions of total concentration. We did this because previous feedback on our manuscript suggested that the difference between these two variables needed to be made clearer. Because this point seemed controversial with both reviewers, we removed all mention of M/M kinetics and simply refer to the system as “saturating.” For further explanation, see the first paragraph of our response to Reviewer 1’s “weaknesses” in the public review.

      (3) It is not clear to me how the inclusion of wild-type, GFP-tagged dorsal in the experimental setup for Figure 5 is not confounding. For the S317 (phospho-) mutant, GFPtagged alleles of both phospho- and wild-type Dl are expressed. The reasoning is that not enough phospho-mutant Dl gets into the nucleus, and this makes it difficult to distinguish the dorsal from the ventral side of the embryo, so in a dl mutant background, there is expression of wt GFP-dl from a BAC, and nos>Gal4 driven expression of a GFP-tagged S317A mutant dl. The measurements show that on the ventral side of the embryo, there is no difference in the fraction of bound Dl. Couldn't this be predominantly binding of wildtype GFP-Dl? How is this interpretable? Wouldn't it be easier to perform these measurements in a Tl 10b background (or to cross in UAS>Tl[10b]) and for the only GFPtagged dl to be S317A? The same goes for the S234 mutant (could be done in the pelle mutant background).

      We thank the reviewer for raising the point that the confounding effect of wildtype Dl makes it difficult to interpret the results from the 317A mutant. Under the circumstances of the experimental design, we can best conclude that, if the null hypothesis is incorrect, the effect size was too small to detect with our sample size. As such, we have modified our discussion of the results of this experiment to carefully explain this caveat (rather than confidently saying that Toll phosphorylation has no effect). For further explanation, see the second paragraph of our response to Reviewer 1’s “weaknesses” in the public review, as well as our response to the related question raised by Reviewer 2 in the public review.

      Minor issues/typo stuff:

      (1) This reviewer notes that the submitted materials contain neither line numbers nor page numbers.

      We appreciate the reviewer’s feedback. We have now included line numbers and page numbers in the revised manuscript for easier reference.

      (2) First paragraph of results: "We imaged small regions of the embryo..." The parenthetical statement only cites pixel size and directs the reader to the methods. Without the total number of pixels, the pixel size value does not clarify how "small" the imaged region is. Consider including the xy area, pixel dimensions, and pixel size here to assert the smallness of the imaged area.

      We have added the requested information.

      (3) Second paragraph, Introduction: "Dorsal, one of three (Drosophila) homologs to mammalian NF-kB" (Add Drosophila). Also, aren't these orthologs?

      We have made these changes.

      (4) Last sentence of last paragraph in the introduction: Kind of a throw-away sentence. Consider revising.

      We thank the reviewer for making this point; the sentence was originally constructed to state that our quantitative measurements resulted in a biologically significant discovery. However, because Reviewer 2 also mentioned the question of biological significance, we have changed this final sentence to explicitly mention of what the biological significance is: namely, an understanding of the Dl gradient that has superior dynamic range, spatial range, robustness, and precision.

      (5) Where is the median line in the S317A boxplot in Fig 5C?

      The median line is at ψ = 0. We have added an explanation of this to the Figure legend.

      (6) Materials & Methods: Fly transformation, typo: Drosophila embryos were injected with 0.5 µl of each pUAST construct..." The volume of an entire Drosophila embryo is less than 0.5 µl, please revise the units to reflect the value injected. Most likely an absolute volume unit was stated when rather a concentration of an injection solution, delivered at significantly smaller volumes was intended.

      We thank the reviewer for catching this typo. It was intended to indicate a concentration of 0.5 ng/μL, and we have made the appropriate changes.

      Reviewer #2 (Recommendations for the authors):

      (1) Perhaps this has been described in a prior publication (if this is the case, please simply state this somewhere in the Methods section where Dl-GFP embryos are described), but since Dl-GFP embryos have one copy of endogenous dl and one copy of Dl-GFP, how do potential differences in tagged vs. non-tagged Dl interactions with DNA or Cact affect their findings?

      The reviewer brings up a good point, and we acknowledge that any time a protein is tagged with GFP, the behavior of the protein may be affected. We have now explicitly added this caveat to our discussion in a new paragraph on Lines 420-429.

      (2) In the Discussion section, the authors argue that a major implication of their findings is the possibility that Cact binds Dl in the nuclei would imply that the true (active) Dl gradient may be unknown unless the unbounded Dl is separated from the Dl/Cact (inactive form). While this is an interesting point, this idea is not supported by the findings of Figure 5B where there is no effect in the fraction of Dl bound to DNA in the reduced Cactus binding mutants. The authors should report what happens in lateral regions in Figure 5 because perhaps there is an effect there (see comment on this in the Public Review).

      We thank the reviewer for the insight, as we did not directly discuss the implications of the middle column of Fig. 5B on our hypothesis. Indeed, our hypothesis is not supported by Fig. 5B; it is instead inconclusive (failure to reject H0). This is why we designed the second experiment (Fig. 5C) to test the Cactus hypothesis, because the effect size would be greater on the dorsal side.

      Furthermore, as pointed out by both reviewers, the presence of wildtype Dl-GFP in these experiments is confounding. We have discussed this elsewhere in our rebuttal, but briefly, this problem resulted in needing larger effect sizes to detect a statistically significant difference between wt and the mutant populations. This was a necessary evil that we were willing to deal with in order to ensure the Dl gradient could be established so that the dorsal vs ventral sides would be distinguishable. We have added a fuller discussion of these issues to the relevant Results section (Lines 333-336, 343-345, 354-359, 365-369) and also the Discussion section (Lines 412-418), including underscoring the fact that, from a falsification standpoint, the results in Fig. 5B do not allow us to reject either null hypothesis, possibly due to the confounding effect of wildtype Dl. We appreciate the reviewer’s point about this, and believe the changes suggested by the reviewer have improved the manuscript.

      On the other hand, we respectfully disagree with the reviewer that investigating either mutant in the lateral regions of the embryo would bear fruit. To the first approximation, it would be the average between the behaviors on the ventral vs. dorsal sides. For the S317A mutant, neither the ventral nor the dorsal side was conclusive in regards to our hypotheses. (Although we admit here that further investigation into why the S317A column in Fig. 5C was statistically different from wildtype, in the opposite direction from the S234P mutant, may be interesting in future work.) For the S234P mutant, the data were more conclusive on the side of the embryo where the effect size was expected to be large enough to detect a difference. In the lateral regions, the expectation would be that the effect size would be intermediate, which would make the interpretation of the results more difficult (i.e., more likely to be inconclusive). In contrast, as Fig. 5C is already conclusive, we are not confident there would be more information gained by imaging the lateral regions.

      (3) Is Figure 5A a wild-type embryo? If so, I think that the labels are misleading or unclear. Also, is it the same image as in Figure 1A? If so, I suggest replacing this with a schematic since it does not add any new data.

      We have eliminated the labels for the mutants and have added the following comment to the figure 5 legend “Same embryo as in Fig. 1A”.

      (4) Also in Figure 5, I suggest using labels to indicate the schematics instead of simply using their location. You could use 5A', 5A' and 5A', for example.

      We have made the suggested changes.

      (5) The use of some technical labels makes some figures difficult to read. I suggest using more simple labels for mutants in Figure 3F (replace R063C) or Figure 5B, C (replace S234P and S317A).

      We have made changes to Fig. 3F, Fig. 5B,C, and the corresponding places in the figure legends. We have labeled R063C as ↓DNA, S317A as ↓Toll, and S234P as ↓Cact.

      (6) I suggest reporting p-values consistently. For example, in Figure 4B, they use one or two asterisks to denote p-values less than 0.07 and 0.05, respectively, which is somehow arbitrary and unconventional. Why not report the actual values as in Figure 5C, for example? (By the way, I would report in Figure 5B the actual p-values as well, since a nonsignificant value is also reported in Figure 5C. Also in Figure 5C, report values in the same notation (decimal or scientific), i.e., either put 0.005 as 5x10^-3 or 10^-3 as 0.001).

      We have made the suggested changes.

    1. eLife Assessment

      This study provides important insights regarding the temporal dynamics of dopamine across sleep/wake transitions in several brain areas. Using multi-site fiber photometry combined with EEG/EMG recordings, the study revealed heterogenous dynamics across both cortical and several subcortical areas. Although the evidence for these observations is solid, evidence for the proposed mechanisms driving DA dynamics is incomplete. Overall, the study may have a substantial impact on several fields working on the neurobiology of DA signaling.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen, Tu, and Lu focused on how brain-wide dopamine release dynamically changes during sleep/wake state transitions. Using multi-site fiber photometry to monitor DA release, alongside simultaneous EEG and EMG recordings, the authors show distinct DA dynamics during transitions from NREM to WAKE, REM to WAKE, WAKE to NREM, and NREM to REM. Next, they analyze temporal coordination between regions using cross-correlation analysis. Finally, chemogenetic activation of VTA or DRN but not SNc dopamine neurons is shown to promote wakefulness.

      Strengths:

      The manuscript addresses an interesting question: how brainwide dopamine activity evolves across sleep/wake transitions. The combination of multi-site DA recordings with simultaneous EEG/EMG monitoring is technically sophisticated. The experimental logic is generally clear, and the dataset is rich. The result has several interesting observations.

      Weaknesses:

      The authors used the GRAB-DA2m sensor to monitor dopamine release. Although DA2m exhibits higher affinity for dopamine compared to NE (around 15-fold difference in EC50 in HEK cell assays), it is still possible that NE contributes to the recorded signals, particularly during sleep/wake transitions when locus coeruleus activity is strongly modulated. Given the widespread and state-dependent dynamics of NE, this potentially needs to be addressed.

      Similarly, the chemogenetic experiments rely on CNO to activate hM3Dq-expressing dopamine neurons. However, it is well established that CNO can be converted to clozapine in rodents, and clozapine itself is known to influence sleep/wake. Although the authors included non-hM3Dq-expressing mice as controls, the potential confounding effects of clozapine on sleep regulation remain a concern.

      Midbrain dopamine neurons exhibit both tonic and phasic firing patterns. In Figure 1, most reported dopamine transitions appear relatively slow. However, some faster, phasic-like components are observable. For example, in NAc-L during REM-to-WAKE transitions, there are 2 phasic-like decreases between −20 and 0 s. The authors used laser-evoked stimulation experiments in the VTA and DRN and showed that 2 s versus 10 s stimulation produces distinct dopamine kinetics, suggesting that different firing patterns generate distinct DA dynamics. Moreover, the temporal profiles vary not only across regions but also across transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid decrease, whereas REM-to-WAKE displays a much slower decline. Similarly, some regions (e.g., NAc-L NREM-to-WAKE, DRN REM-to-WAKE) show faster changes, while others (e.g., mPFC WAKE-to-NREM, VTA NREM-to-WAKE) show slower kinetics. These observations argue against a simple region-specific explanation and instead suggest that distinct firing modes may differentially contribute depending on transition type.

      While cross-correlation analysis provides insight into the temporal coordination of DA signals across regions, several limitations should be considered. Sleep/wake transitions are inherently non-stationary events, whereas cross-correlation assumes relatively stable signal properties within the analysis window. This mismatch may bias lag estimates and obscure transient lead-lag relationships. Moreover, the temporal resolution of fiber photometry and the kinetics of genetically encoded DA sensors limit the precision with which timing relationships can be interpreted, particularly for sub-second lags.

      In the Introduction, the authors state that they aim to address 'which dopaminergic populations causally drive these patterns.' However, the chemogenetic approach used operates on a relatively slow timescale: CNO-induced activation takes 15-30 minutes to produce effects, and the induced changes are long-lasting. In contrast, the dopamine transitions described in Figure 1 occur on a much faster timescale compared to CNO manipulation. Thus, while chemogenetic activation demonstrates that stimulating VTA or DRN dopamine neurons promotes wakefulness, it does not directly establish that these populations causally drive the rapid transition-related DA dynamics observed in the photometry recordings.

    3. Reviewer #2 (Public review):

      In "Brainwide dopamine dynamics across sleep-wake transitions", Chen et al. provide a thorough description of how dopamine dynamics fluctuate across sleep-wake transitions and in transitions between sleep states. To achieve this, the authors used multi-channel fiber photometry and a genetically encoded fluorescent dopamine reporter to simultaneously measure dopamine dynamics in 8 brain regions. They also used EEG measurements to precisely quantify and time transitions between sleep states and wakefulness. Finally, the authors used channelrhodopsin to examine dopamine dynamics following subregion stimulation and chemogenetics to test the causal relationship between activation of distinct dopamine neuron populations and their effects on sleep state.

      The conclusions made by the authors in this study are modest and appropriate given the largely observational nature of the principal findings. The use of optogenetics to probe regional dopamine signaling following activation of distinct nuclei is interesting, but not entirely novel and constrained in interpretability. Similarly, the chemogenetics experiment largely confirms previous studies, which the authors correctly cited in the text.

      The principal findings of this study are based on strong methodological and analytical methods. Implanting 8 optical fibers in a single mouse, along with EEG/EMG electrodes, is technically challenging, providing valuable, simultaneous measurements of dopamine fluctuations across the brain. This enables the strong correlational and time-locked analyses performed by the authors in Figure 2. What's more, the use of EEG/EMG electrodes provides time-locked descriptions of sleep states, enabling precise comparisons between the dopamine signal and sleep state transitions.

      The paper has some weaknesses that the authors could address. The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states. The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results. Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Chen, Tu, and Lu focused on how brain-wide dopamine release dynamically changes during sleep/wake state transitions. Using multi-site fiber photometry to monitor DA release, alongside simultaneous EEG and EMG recordings, the authors show distinct DA dynamics during transitions from NREM to WAKE, REM to WAKE, WAKE to NREM, and NREM to REM. Next, they analyze temporal coordination between regions using cross-correlation analysis. Finally, chemogenetic activation of VTA or DRN but not SNc dopamine neurons is shown to promote wakefulness.

      Strengths:

      The manuscript addresses an interesting question: how brainwide dopamine activity evolves across sleep/wake transitions. The combination of multi-site DA recordings with simultaneous EEG/EMG monitoring is technically sophisticated. The experimental logic is generally clear, and the dataset is rich. The result has several interesting observations.

      Weaknesses:

      The authors used the GRAB-DA2m sensor to monitor dopamine release. Although DA2m exhibits higher affinity for dopamine compared to NE (around 15-fold difference in EC50 in HEK cell assays), it is still possible that NE contributes to the recorded signals, particularly during sleep/wake transitions when locus coeruleus activity is strongly modulated. Given the widespread and state-dependent dynamics of NE, this potentially needs to be addressed.

      We thank the reviewer for raising this important methodological consideration. While we acknowledge that a minor contribution from norepinephrine (NE) to the DA2m signal cannot be categorically excluded, several convergent lines of evidence give us confidence that the signals we recorded primarily reflect dopamine release.

      First, DA2m has substantially lower affinity for NE compared to dopamine. The reported EC<sub>50</sub> for NE is ~1200 nM [1], which is ~15-fold higher than for dopamine. In contrast, extracellular NE levels in the prefrontal cortex are typically in the low nanomolar range (generally <5 nM under basal conditions) [2,3]. Because physiological NE concentrations are orders of magnitude below the sensor’s EC<sub>50</sub> threshold, NE is highly unlikely to drive significant DA2m activation in vivo.

      Second, our optogenetic experiments provide direct functional validation. The targeted stimulation of midbrain dopaminergic neurons elicited robust DA2m signal responses across both cortical and subcortical brain areas. This confirms that the sensor reliably captures evoked dopamine release within our specific experimental paradigm.

      Finally, the spontaneous DA2m signal dynamics we observed across sleep-wake states functionally diverge from previously reported patterns of cortical NE release [4]. For example, in Figure 1C, our DA2m recordings in the mPFC revealed high activity during wakefulness, alongside pronounced, sharp changes during NREM-to-WAKE transitions. In contrast, prior study [4] show that NE exhibits comparatively mild fluctuations during wakefulness and transitions between NREM. This temporal and kinetic divergence further supports that our recorded signals isolate region-specific dopaminergic dynamics rather than generalized NE arousal activity.

      Taken together, these physiological, functional, and kinetic distinctions indicate that while a negligible contribution from NE cannot be entirely ruled out, it is highly unlikely to account for a substantial portion of the DA2m signals observed during sleep-wake transitions in our study.

      Similarly, the chemogenetic experiments rely on CNO to activate hM3Dq-expressing dopamine neurons. However, it is well established that CNO can be converted to clozapine in rodents, and clozapine itself is known to influence sleep/wake. Although the authors included non-hM3Dq-expressing mice as controls, the potential confounding effects of clozapine on sleep regulation remain a concern.

      We appreciate the reviewer raising this important point regarding the metabolism of CNO. We are aware of the evidence suggesting that CNO can undergo back-metabolism to clozapine in rodents, which could potentially exert independent effects on sleep-wake architecture. To mitigate this concern, we strictly employed several experimental safeguards:

      (A) Non-hM3Dq Control Group: As noted by the reviewer, we included a cohort of mice that did not express the hM3Dq receptor but received the same dosage of CNO (1 mg/kg). In these animals, we observed no significant alterations in sleep-wake states compared to saline baseline (Figure S3), suggesting that at this dosage, any clozapine produced was below the threshold for behavioral modulation of sleep.

      (B) Dosage Selection: We utilized a relatively low dose of CNO (1 mg/kg), which is widely reported in the literature to minimize the accumulation of clozapine to levels that would interfere with EEG-defined sleep states in rodents [5]. Furthermore, studies have demonstrated that while higher doses of CNO (e.g., 5–10 mg/kg) can produce clozapinelike effects on sleep architecture, lower doses around 1 mg/kg do not yield significant alterations in cortical EEG power distribution or sleep-wake amounts in control animals [6,7].

      Midbrain dopamine neurons exhibit both tonic and phasic firing patterns. In Figure 1, most reported dopamine transitions appear relatively slow. However, some faster, phasic-like components are observable. For example, in NAc-L during REM-to-WAKE transitions, there are 2 phasic-like decreases between −20 and 0 s. The authors used laser-evoked stimulation experiments in the VTA and DRN and showed that 2 s versus 10 s stimulation produces distinct dopamine kinetics, suggesting that different firing patterns generate distinct DA dynamics. Moreover, the temporal profiles vary not only across regions but also across transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid decrease, whereas REM-to-WAKE displays a much slower decline. Similarly, some regions (e.g., NAc-L NREM-to-WAKE, DRN REM-toWAKE) show faster changes, while others (e.g., mPFC WAKE-to-NREM, VTA NREM-toWAKE) show slower kinetics. These observations argue against a simple region-specific explanation and instead suggest that distinct firing modes may differentially contribute depending on transition type.

      We thank the reviewer for this insightful comment. We agree that midbrain dopamine neurons exhibit both tonic and phasic action-potential firing patterns. As summarized by Grace et al., dopamine neurons recorded using in vivo electrophysiology can display a slow, irregular, single-spike “tonic” firing pattern, typically around 2–10 Hz, as well as burst-like “phasic” firing patterns [8].

      However, our recordings were performed using GRAB-DA2m fiber photometry. Therefore, our measurements reflect extracellular dopamine dynamics in the recorded target regions rather than the action-potential firing patterns of midbrain dopamine neurons. GRABDA2m has subsecond sensor kinetics and is suitable for detecting extracellular dopamine transients occurring over hundreds of milliseconds to seconds, as well as slower dynamics occurring over seconds to tens of seconds [1], which matches the timescale of the sleep–wake transition-related dynamics observed in previous studies [9,10]. Nevertheless, GRAB-DA2m fiber photometry in our study does not directly resolve dopamine neuron spike timing or distinguish tonic from phasic firing modes. Accordingly, we interpret our signals as extracellular dopamine concentration dynamics rather than as direct measurements of tonic or phasic neuronal firing.

      Therefore, the transition-aligned dopamine signals shown in Figure 1 should be interpreted as dopamine dynamics occurring over seconds-to-tens-of-seconds around sleep–wake transitions, rather than as dopamine neuron firing patterns. In addition, these traces represent GRAB-DA2m signals averaged across sessions and mice within a ±30 s window centered on each sleep/wake transition. Thus, they do not necessarily represent individual dopamine transient patterns on single transitions. We also acknowledge the reviewer’s observation that faster phasic-like components are visible in some traces, including the decreases in the NAc-L preceding REM-to-WAKE transitions. Direct electrophysiological recordings of dopamine neuron firing during sleep–wake transitions would be useful in future studies to determine how tonic and phasic firing modes contribute to the observed dopamine dynamics.

      In the laser-evoked stimulation experiments shown in Figure 3, we thank the reviewer for the thoughtful interpretation. The results indicate that different stimulation durations can produce distinct dopamine release dynamics in downstream projection regions. Moreover, prolonged optogenetic stimulation was associated with more sustained dopamine responses, suggesting that the temporal profile of extracellular dopamine dynamics depends, at least in part, on the duration and region of dopaminergic input [1]. We also agree with the reviewer that the temporal profiles of the GRAB-DA2m signals vary not only across regions, but also across sleep/wake transitions within the same region. For example, in CeA, the NREM-to-WAKE transition shows a relatively rapid dopamine decrease, whereas the REM-to-WAKE transition displays a slower decline.

      Similarly, faster dopamine changes are observed in some region/transition combinations, such as NAc-L during NREM-to-WAKE and DRN during REM-to-WAKE, whereas slower kinetics are observed in others, such as mPFC during WAKE-to-NREM and VTA during NREM-to-WAKE. Together, these effects reflect both region-specific mechanisms and transition-dependent differences in dopaminergic activity.

      While cross-correlation analysis provides insight into the temporal coordination of DA signals across regions, several limitations should be considered. Sleep/wake transitions are inherently non-stationary events, whereas cross-correlation assumes relatively stable signal properties within the analysis window. This mismatch may bias lag estimates and obscure transient lead-lag relationships. Moreover, the temporal resolution of fiber photometry and the kinetics of genetically encoded DA sensors limit the precision with which timing relationships can be interpreted, particularly for sub-second lags.

      We thank the reviewer for raising these important considerations. The temporal relationships between regional dopamine signals were assessed using cross-covariance analysis. We agree that cross-covariance analysis has limitations when applied to sleep/wake transitions, because these transitions are inherently non-stationary events. Although cross-covariance centers the signals by subtracting their means and is therefore less sensitive to baseline offsets than raw cross-correlation, it still summarizes the lagdependent covariance between two signals over the selected analysis window. Therefore, the inferred lag should be interpreted as a transition-level measure of temporal coordination rather than a precise estimate of instantaneous lead–lag timing.

      To minimize the influence of brief or unstable state fluctuations, we only included transitions in which both the preceding and following sleep/wake epochs lasted at least 30 s, and excluded epochs shorter than 30 s [4]. This criterion helped ensure that the analyzed events represented well-defined transitions between sustained behavioral states rather than transient or fragmented episodes. Although dopamine signals may still change dynamically within the transition window, and the temporal resolution of fiber photometry and the kinetics of genetically encoded GRAB-DA2m sensors limit the precision with which fine-scale timing relationships can be interpreted, dopamine signals were relatively stable within each behavioral state, as shown in Fig. 1B and reported previously [1,9,10] Thus, we believe that cross-covariance analysis provides useful information about the temporal coordination of dopamine dynamics across regions.

      In the Introduction, the authors state that they aim to address 'which dopaminergic populations causally drive these patterns.' However, the chemogenetic approach used operates on a relatively slow timescale: CNO-induced activation takes 15-30 minutes to produce effects, and the induced changes are long-lasting. In contrast, the dopamine transitions described in Figure 1 occur on a much faster timescale compared to CNO manipulation. Thus, while chemogenetic activation demonstrates that stimulating VTA or DRN dopamine neurons promotes wakefulness, it does not directly establish that these populations causally drive the rapid transition-related DA dynamics observed in the photometry recordings.

      We thank the reviewer for this thoughtful comment. We agree that chemogenetic manipulation operates on a much slower timescale than the rapid dopamine transients observed during sleep–wake transitions, and therefore does not directly recapitulate these fast dynamics. In particular, CNO-induced activation unfolds over minutes and produces sustained changes in neuronal activity, whereas the DA signals we report fluctuate on a sub-second to second timescale. Our intention with the chemogenetic experiments was not to mimic the precise temporal profile of endogenous DA signals, but rather to test whether increasing the activity of specific dopaminergic populations is sufficient to influence behavioral state.

      In this context, our results show that activation of VTA or DRN dopaminergic neurons robustly promotes wakefulness, supporting a causal role for these populations in sleep– wake regulation at the circuit level. However, we agree that these data do not by themselves establish that these neurons directly generate the rapid transition-related DA dynamics observed in the photometry recordings.

      Reviewer #2 (Public review):

      In "Brainwide dopamine dynamics across sleep-wake transitions", Chen et al. provide a thorough description of how dopamine dynamics fluctuate across sleep-wake transitions and in transitions between sleep states. To achieve this, the authors used multi-channel fiber photometry and a genetically encoded fluorescent dopamine reporter to simultaneously measure dopamine dynamics in 8 brain regions. They also used EEG measurements to precisely quantify and time transitions between sleep states and wakefulness. Finally, the authors used channelrhodopsin to examine dopamine dynamics following subregion stimulation and chemogenetics to test the causal relationship between activation of distinct dopamine neuron populations and their effects on sleep state.

      The conclusions made by the authors in this study are modest and appropriate given the largely observational nature of the principal findings. The use of optogenetics to probe regional dopamine signaling following activation of distinct nuclei is interesting, but not entirely novel and constrained in interpretability. Similarly, the chemogenetics experiment largely confirms previous studies, which the authors correctly cited in the text.

      The principal findings of this study are based on strong methodological and analytical methods. Implanting 8 optical fibers in a single mouse, along with EEG/EMG electrodes, is technically challenging, providing valuable, simultaneous measurements of dopamine fluctuations across the brain. This enables the strong correlational and time-locked analyses performed by the authors in Figure 2. What's more, the use of EEG/EMG electrodes provides time-locked descriptions of sleep states, enabling precise comparisons between the dopamine signal and sleep state transitions.

      The paper has some weaknesses that the authors could address. The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states. The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results. Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

      The analyses in Figure 1 could be strengthened to show how dopamine changes during transitions between specific sleep states.

      We appreciate the reviewer’s thoughtful suggestion. We agree that the directionality and kinetics of dopamine changes during sleep/wake transitions may provide important information beyond state-level dopamine quantification.

      In this study, mice were recorded for 4–5 h during each sleep session. Across the recording period, mice frequently transitioned from NREM to WAKE, WAKE to NREM, NREM to REM, and REM to WAKE. Transitions from WAKE to REM were rarely observed and therefore were not included in the transition analysis. Accordingly, we focused our analysis on the four major transition types: NREM-to-WAKE, WAKE-to-NREM, NREM-toREM, and REM-to-WAKE [4,9,11].

      For each transition type, dopamine dynamics were analyzed separately by aligning the zscored GRAB-DA2m signal to the transition onset and averaging across all epochs of the same transition type. To minimize the influence of brief or unstable state fluctuations, we excluded transitions in which either the preceding or following sleep/wake epoch lasted less than 30 s. The resulting transition-triggered dopamine traces were then averaged across sessions and mice for each transition type independently.

      Thus, the transition analysis preserves the directionality of state changes rather than pooling all sleep/wake transitions together. Because dopamine signals differ across behavioral states, transitions between neighboring states produce distinct temporal profiles when aligned to the transition point [4,9-11]. For example, REM-to-WAKE transitions may show a rapid increase in dopamine in the mPFC, whereas WAKE-to-NREM or NREM-to-REM transitions may show slower and more modest decreases. These transition - specific kinetics may reflect distinct underlying mechanisms, including changes in dopamine neuron firing or local terminal modulation.

      The injection sites for channelrhodopsin and chemogenetic viruses could be validated to strengthen the interpretation of those results.

      We agree with the reviewer that precise histological validation is essential for the correct interpretation of our optogenetic and chemogenetic findings.

      Regarding the chemogenetic experiments, as noted, we provide examples of virus expression in the VTA, DRN, and SNc in Figure 4. By demonstrating the consistency and restriction of our targeting across the entire cohort (VTA, SNc, and DRN), we confirmed that our observed sleep effects were regionally specific. Our data only included mice with accurate targeting and no substantial virus "leakage" into adjacent nuclei.

      We thank the reviewer for this insightful observation regarding the regional dopamine (DA) responses following SNc stimulation. While the SNc is traditionally associated with the dorsal striatum (DLS), several studies have demonstrated that SNc dopaminergic neurons also project to the nucleus accumbens, particularly the lateral shell [12,13]. Furthermore, recent work characterizing the functional heterogeneity of midbrain DA neurons suggests that SNc subpopulations can drive significant DA release in ventral striatal subregions [14]. We appreciate the reviewer’s caution regarding potential off-target effects. While our histological criteria for validation post recordings were stringent, we acknowledge that in any midbrain manipulation, the close anatomical proximity of the VTA and SNc makes it technically challenging to guarantee zero involvement of neighboring VTA neurons. However, by using mice with the most restricted virus expression and fibers targeting, we have minimized this potential confound as much as is technically feasible with current viral and optogenetic methods.

      Also, a stronger justification for the experiments conducted in Figure 3 could be provided, as they seem unrelated to the present study.

      We thank the reviewer for this comment. The experiments in Figure 3 were designed to systematically map the sources of dopaminergic inputs to key brain regions examined in this study [15], including the mPFC, DLS, NAc, and CeA. Establishing these input–output relationships is important for interpreting the photometry signals observed during sleep– wake transitions.

      Specifically, we found that optogenetic activation of VTA dopaminergic neurons elicits DA responses in all four regions, whereas activation of DRN dopaminergic neurons induces responses in the mPFC, DLS, and CeA, and activation of SNc dopaminergic neurons induces responses in the mPFC, NAc, and DLS. These results reveal partially overlapping but distinct projection patterns across dopaminergic populations.

      Taken together, these data provide a circuit-level framework suggesting that VTA, SNc, and DRN dopaminergic neurons may contribute differentially and with distinct weights to the DA signals observed in these regions during sleep wake transitions.

      Overall, this study has strong descriptive power, convincingly showing how dopamine fluctuates across sleep states. Some of the other aspects of the paper, however, are somewhat limited in novelty and interpretation.

      We appreciate the reviewer’s assessment that our study convincingly demonstrates how dopamine fluctuates across sleep states. We agree that the primary contribution of this work is descriptive and foundational. At the same time, we respectfully emphasize that rigorous, comprehensive descriptive studies are essential, particularly when addressing phenomena that have not been systematically characterized. Prior to this work, dopamine dynamics during natural sleep–wake transitions had not been measured simultaneously across multiple brain regions.

      Our multi-site photometry approach advances the field in several important ways. Technically, the combination of simultaneous eight-region fiber photometry with EEG/EMG recordings represents a substantial methodological advance, enabling brainwide, network-level analysis of dopamine dynamics during natural state transitions. This approach reveals emergent features—such as temporal coordination and inter-regional lead–lag relationships—that cannot be captured using single-site recordings. Moreover, integrating brain-wide measurements with region-specific manipulations allows circuitlevel insights that would not be accessible from either approach alone.

      Conceptually, our findings revealed the region, sleep/wake transition type -specific and bidirectional dopamine dynamics, instead of the prevailing view of dopamine as a uniform arousal signal: dopamine decreases in certain limbic regions, such as the central amygdala and nucleus accumbens lateral shell, during arousal transitions, while increasing in cortical and other striatal regions. These results refine simplified models of dopaminergic regulation of arousal. In addition, our data reveal differential circuit contributions, with the VTA and DRN—but not the SNc—promoting wakefulness, highlighting functional specialization within the dopamine system.

      We acknowledge that some aspects of our study, including the optogenetic mapping and chemogenetic experiments, build on established methodologies and in part confirm prior findings. However, these experiments also provide several new insights. First, whereas individual dopamine sources have often been studied in isolation, our systematic comparison across VTA, SNc, and DRN using consistent methods reveals distinct brainwide functional contributions that were not previously established. Second, our optogenetic mapping does not simply recapitulate known projection patterns, but instead uncovers quantitative differences in dopamine release kinetics and magnitude across source–target pairs, which inform the heterogeneity of the transition dynamics. Finally, our findings provide a crucial anatomical and temporal framework for future research on the specific mechanisms driving these dynamics and their precise functional consequences.

      References:

      (1) Sun, F. et al. Next-generation GRAB sensors for monitoring dopaminergic activity in vivo. Nat Methods 17, 1156-1166, doi:10.1038/s41592-020-00981-9 (2020).

      (2) Ihalainen, J. A., Riekkinen, P., Jr. & Feenstra, M. G. Comparison of dopamine and noradrenaline release in mouse prefrontal cortex, striatum and hippocampus using microdialysis. Neurosci Lett 277, 71-74, doi:10.1016/s0304-3940(99)00840-x (1999).

      (3) Berridge, C. W. & Abercrombie, E. D. Relationship between locus coeruleus discharge rates and rates of norepinephrine release within neocortex as assessed by in vivo microdialysis. Neuroscience 93, 1263-1270, doi:10.1016/s0306-4522(99)00276-6 (1999).

      (4) Silverman, D. et al. Activation of locus coeruleus noradrenergic neurons rapidly drives homeostatic sleep pressure. Sci Adv 11, eadq0651, doi:10.1126/sciadv.adq0651 (2025).

      (5) Anaclet, C. et al. The GABAergic parafacial zone is a medullary slow wave sleeppromoting center (vol 17, pg 1217, 2014). Nat Neurosci 17, 1841-1841, doi:DOI 10.1038/nn1214-1841d (2014).

      (6) Ma, C. Y. et al. Microglia regulate sleep through calcium-dependent modulation of norepinephrine transmission. Nat Neurosci 27, 249-258, doi:10.1038/s41593-02301548-5 (2024).

      (7) Traut, J. et al. Effects of clozapine-N-oxide and compound 21 on sleep in laboratory mice. Elife 12, doi:10.7554/eLife.84740 (2023).

      (8) Grace, A. A., Floresco, S. B., Goto, Y. & Lodge, D. J. Regulation of firing of dopaminergic neurons and control of goal-directed behaviors. Trends Neurosci 30, 220-227, doi:10.1016/j.tins.2007.03.003 (2007).

      (9) Darmohray, D. et al. Brainstem circuit for sickness-induced sleep. Sci Adv 11, doi:ARTN eady024510.1126/sciadv.ady0245 (2025).

      (10) Hasegawa, E. et al. Rapid eye movement sleep is initiated by basolateral amygdala dopamine signaling in mice. Science 375, 994-+, doi:10.1126/science.abl6618 (2022).

      (11) Ding, X. et al. Neuroendocrine circuit for sleep-dependent growth hormone release. Cell 188, 4968-4979 e4912, doi:10.1016/j.cell.2025.05.039 (2025).

      (12) Poulin, J. F. et al. Mapping projections of molecularly defined dopamine neuron subtypes using intersectional genetic approaches. Nat Neurosci 21, 1260-1271, doi:10.1038/s41593-018-0203-4 (2018).

      (13) Lerner, T. N. et al. Intact-Brain Analyses Reveal Distinct Information Carried by SNc Dopamine Subcircuits. Cell 162, 635-647, doi:10.1016/j.cell.2015.07.014 (2015).

      (14) Azcorra, M. et al. Unique functional responses differentially map onto genetic subtypes of dopamine neurons. Nat Neurosci 26, 1762-1774, doi:10.1038/s41593023-01401-9 (2023).

      (15) Eban-Rothschild, A., Rothschild, G., Giardino, W. J., Jones, J. R. & de Lecea, L. VTA dopaminergic neurons regulate ethologically relevant sleep-wake behaviors. Nat Neurosci 19, 1356-1366, doi:10.1038/nn.4377 (2016).

    1. eLife Assessment

      The study presents important findings revealing previously unresolved conformational dynamics of the heterodimeric type IV ABC transporter TmrAB using single-molecule FRET. The evidence presented is solid, integrating careful experimental design with computational approaches to uncover states that are typically masked and difficult to detect. The work will be of interest to scientists studying the molecular mechanisms of primary active transport processes.

    2. Reviewer #1 (Public review):

      Summary:

      Pecak et al have deciphered the conformational dynamics of a heterodimeric model ABC transporter, TmrAB, a functional homolog of the human antigen transporter TAP, using single-molecule Forster resonance energy and fluorophores attached to residues at either nucleotide binding domains or periplasmic gate. The analysis not only differentiated ATP-free and bound states but also enabled the real-time monitoring of protein conformational changes, precisely dissecting transport cycles and resolving transient intermediates. This study is absolutely significant in providing and establishing a general pipeline delineating the conformational dynamics in heterodimeric ABC transporters.

      Strengths:

      The scientific study is very well documented for experimental design, results, and conclusions supported by the experimental data. The authors have determined the conformational dynamics of TmrAB across different ATP concentrations, including physiological ones, and resolved an outward open state and other conformational states consistent with previous cryoEM and DEER studies.

      Weaknesses:

      The scientific study needs a bit of in-depth analysis with respect to consistency in Kd and its implications on the mechanism.

    3. Reviewer #2 (Public review):

      In their manuscript entitled 'ATP-driven conformational dynamics reveal hidden intermediates in a heterodimeric ABC transporter', Pečak et al. use elegant single-molecule FRET experiments in detergent to investigate the heterodimeric ABC transporter TmrAB. By combining simulations of the transporter's accessible volume with elegant trapping strategies, the authors identify an unresolved outward-facing open state and conclude that it is usually obscured by a rapidly interconverting ATP-bound ensemble. Overall, the study demonstrates that smFRET can resolve the short-lived intermediate states of TmrAB and potentially other ABC transporters that are obscured in ensemble measurements.

      It is a very interesting study that highlights the power of combining high-resolution structural information with spectroscopic approaches. I have three major points and a few minor criticisms.

      Major points:

      (1) The main weakness is that the authors base their conclusions on a very limited set of FRET pairs. While TmrAB has been extensively studied in terms of its structure, the authors should at least acknowledge this limitation more clearly.

      (2) Most smFRET distributions were fitted with one, two, or three Gaussians. However, in several cases, additional populations with noticeable amplitudes appear to be present (e.g., Figure 3c at 0.1 mM and 3 mM ATP; Figure 4a, apo; Figure 4c, 0.3 mM R9L). Could the authors clarify why these populations were not included in the analysis?

      (3) Figure 3c (3 mM ATP): Is it truly possible to distinguish the two states in this distribution?

    4. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Pecak et al have deciphered the conformational dynamics of a heterodimeric model ABC transporter, TmrAB, a functional homolog of the human antigen transporter TAP, using single-molecule Forster resonance energy and fluorophores attached to residues at either nucleotide binding domains or periplasmic gate. The analysis not only differentiated ATP-free and bound states but also enabled the real-time monitoring of protein conformational changes, precisely dissecting transport cycles and resolving transient intermediates. This study is absolutely significant in providing and establishing a general pipeline delineating the conformational dynamics in heterodimeric ABC transporters.

      We thank the reviewer for this accurate and thoughtful summary of our work and its broader significance. We agree that the combination of single-molecule FRET with orthogonal validation approaches enables mechanistic resolution of conformational states and transitions that are not accessible by ensemble measurements. In particular, this framework allows direct discrimination of ATP-free and ATP-bound conformations, real-time tracking of transport cycle progression, and identification of transient intermediates in the heterodimeric ABC transporter TmrAB. We further agree that these capabilities support a generalizable strategy for dissecting conformation dynamics in related ABC transporters.

      Strengths:

      The scientific study is very well documented for experimental design, results, and conclusions supported by the experimental data. The authors have determined the conformational dynamics of TmrAB across different ATP concentrations, including physiological ones, and resolved an outward open state and other conformational states consistent with previous cryoEM and DEER studies.

      Weaknesses:

      The scientific study needs a bit of in-depth analysis with respect to consistency in K<sub>d</sub> and its implications on the mechanism.

      The apparent K<sub>d,ATP</sub> values were determined using two complementary approaches that report on different aspects of the system. Ensemble FRET measurements yielded values of 51° ± 38° µM (TmrAB<sup>NBD</sup>), 68°  ± 25° µM (TmrAB<sup>PG</sup>), and 95° ± 26° µM (TmrAB<sup>PG_EQ</sup>), which are in good agreement with previously reported biochemical estimates (~100° µM for TmrAB<sup>EQ</sup>) (Stefan et al, 2020). The slightly elevated value observed for the E→Q variant may reflect modest perturbation of nucleotide handling in this slow-turnover background. Notably, the close agreement between labeled and unlabeled variants indicates that fluorophore attachment does not measurably affect ATP binding.

      In contrast, smFRET-derived K<sub>d,ATP</sub> values (13° ± 1° µM for TmrAB<sup>NBD</sup> and 2° ± 1° µM for TmrAB<sup>PG</sup>) are systematically lower. This difference likely arises from the difficulty of deconvoluting overlapping FRET populations at sub-K<sub>d,ATP</sub> concentrations, particularly for TmrAB<sup>PG</sup>, where state assignment is less well separated. Despite this quantitative offset, both approaches consistently indicate ATP saturation well below physiological concentrations and therefore support the same mechanistic conclusion that ATP binding drives conformational switching in TmrAB.

      Reviewer #2 (Public review):

      In their manuscript entitled 'ATP-driven conformational dynamics reveal hidden intermediates in a heterodimeric ABC transporter', Pečak et al. use elegant single-molecule FRET experiments in detergent to investigate the heterodimeric ABC transporter TmrAB. By combining simulations of the transporter's accessible volume with elegant trapping strategies, the authors identify an unresolved outward-facing open state and conclude that it is usually obscured by a rapidly interconverting ATP-bound ensemble. Overall, the study demonstrates that smFRET can resolve the short-lived intermediate states of TmrAB and potentially other ABC transporters that are obscured in ensemble measurements.

      It is a very interesting study that highlights the power of combining high-resolution structural information with spectroscopic approaches. I have three major points and a few minor criticisms.

      We thank the reviewer for the thoughtful and constructive evaluation of our manuscript and for highlighting the strength of combining structural and single-molecule approaches. We have addressed all major and minor points in detail below and revised the manuscript where appropriate to clarify limitations, justify analysis choices, and improve transparency.

      Major points:

      (1) The main weakness is that the authors base their conclusions on a very limited set of FRET pairs. While TmrAB has been extensively studied in terms of its structure, the authors should at least acknowledge this limitation more clearly.

      We agree that our conclusions are based on a limited number of FRET reporter pairs, and we now explicitly state this limitation in the revised manuscript. The chosen labeling positions were selected to probe two functionally critical regions—the nucleotide-binding domains and the periplasmic gate—based on prior structural and spectroscopic evidence. While this represents sparse sampling of the full conformational space, it is consistent with typical smFRET studies of membrane transporters, where experimental constraints generally limit the number of simultaneously accessible labeling positions (Asher et al, 2021; Asher et al, 2022; Levring et al, 2023; Wang et al, 2020).

      Importantly, both independent reporter variants yield consistent ATP-dependent population shifts, supporting the robustness of the observed trends. We further clarify that additional labeling sites could, in principle, resolve finer structural sub-states; however, given the already limited population separation in the current variants, such extensions would likely provide diminishing returns in state resolvability under the present experimental conditions. This trade-off is now explicitly discussed.

      (2) Most smFRET distributions were fitted with one, two, or three Gaussians. However, in several cases, additional populations with noticeable amplitudes appear to be present (e.g., Figure 3c at 0.1 mM and 3 mM ATP; Figure 4a, apo; Figure 4c, 0.3 mM R9L). Could the authors clarify why these populations were not included in the analysis?

      We thank the reviewer for this careful observation. Low-amplitude subpopulations are occasionally detected in individual histograms; however, they were not included in the quantitative model because they do not meet criteria for reproducibility, amplitude robustness, or structural assignability. Specifically, these features vary between replicates, contribute minimally to total population, and cannot be mapped to structurally or biochemically defined states based on available cryo-EM (Hofmann et al, 2019), DEER/PELDOR (Barth et al, 2018; Barth et al, 2020), or accessible-volume simulations.

      Similar minor subpopulations have been reported in smFRET studies and often attributed to photophysical or labeling heterogeneity effects (Asher et al, 2022; Husada et al, 2018). To avoid over-parameterization, we therefore restricted analysis to reproducible, structurally supported states. This rationale is now clarified in the revised manuscript.

      (3) Figure 3c (3 mM ATP): Is it truly possible to distinguish the two states in this distribution?

      We agree that state separation in the TmrAB<sup>PG</sup> variant is limited (ΔE° = °0.11), and we now explicitly acknowledge this constraint in the manuscript. To improve robustness under these conditions, we used a constrained fitting strategy in which the apo-state distribution was fixed from nucleotide-free measurement, reducing parameter degeneracy during fitting of ATP-bound datasets.

      While single-molecule trajectory-based approaches such as Hidden Markov Modeling would be ideal for resolving dynamic interconversion, this was not feasible due to the low fraction of dynamic traces at the available temporal resolution. We therefore rely on population-level analysis, which remains consistent across replicates and reporter variants.

      Notably, independent measurements from two reporter positions (TmrAB<sup>NBD</sup> and TmrAB<sup>PG</sup>) yield similar ATP-bound population fractions at saturating ATP concentrations (~77% vs. ~80%), supporting the robustness of the inferred state distribution despite partial overlap.

      References

      Asher WB, Geggier P, Holsey MD, Gilmore GT, Pati AK, Meszaros J, Terry DS, Mathiasen S, Kaliszewski MJ, McCauley MD, Govindaraju A, Zhou Z, Harikumar KG, Jaqaman K, Miller LJ, Smith AW, Blanchard SC, Javitch JA (2021) Single-molecule FRET imaging of GPCR dimers in living cells. Nat Methods 18: 397–405. doi:10.1038/s41592-021-01081-y

      Asher WB, Terry DS, Gregorio GGA, Kahsai AW, Borgia A, Xie B, Modak A, Zhu Y, Jang W, Govindaraju A, Huang LY, Inoue A, Lambert NA, Gurevich VV, Shi L, Lefkowitz RJ, Blanchard SC, Javitch JA (2022) GPCR-mediated beta-arrestin activation deconvoluted with single-molecule precision. Cell 185: 1661–1675 e1616. doi:10.1016/j.cell.2022.03.042

      Barth K, Hank S, Spindler PE, Prisner TF, Tampé R, Joseph B (2018) Conformational coupling and trans-inhibition in the human antigen transporter ortholog TmrAB resolved with dipolar EPR spectroscopy. J Am Chem Soc 140: 4527–4533. doi:10.1021/jacs.7b12409

      Barth K, Rudolph M, Diederichs T, Prisner TF, Tampé R, Joseph B (2020) Thermodynamic basis for conformational coupling in an ATP-binding cassette exporter. J Phys Chem Lett 11: 7946–7953. doi:10.1021/acs.jpclett.0c01876

      Hofmann S, Januliene D, Mehdipour AR, Thomas C, Stefan E, Brüchert S, Kuhn BT, Geertsma ER, Hummer G, Tampé R, Moeller A (2019) Conformation space of a heterodimeric ABC exporter under turnover conditions. Nature 571: 580–583. doi:10.1038/s41586-019-1391-0

      Husada F, Bountra K, Tassis K, de Boer M, Romano M, Rebuffat S, Beis K, Cordes T (2018) Conformational dynamics of the ABC transporter McjD seen by single-molecule FRET. EMBO J 37: e100056. doi:10.15252/embj.2018100056

      Levring J, Terry DS, Kilic Z, Fitzgerald G, Blanchard SC, Chen J (2023) CFTR function, pathology and pharmacology at single-molecule resolution. Nature 616: 606–614. doi:10.1038/s41586-023-05854-7

      Nocker C, Pečak M, Nocker T, Fahim A, Sušac L, Tampé R (2026) Single-molecule dynamics reveal ATP binding alone powers substrate translocation by an ABC transporter. Nat Commun 17 doi:10.1038/s41467-026-70021-1

      Nöll A, Thomas C, Herbring V, Zollmann T, Barth K, Mehdipour AR, Tomasiak TM, Bruchert S, Joseph B, Abele R, Olieric V, Wang M, Diederichs K, Hummer G, Stroud RM, Pos KM, Tampé R (2017) Crystal structure and mechanistic basis of a functional homolog of the antigen transporter TAP. Proc Natl Acad Sci U S A 114: E438–E447. doi:10.1073/pnas.1620009114

      Stefan E, Hofmann S, Tampé R (2020) A single power stroke by ATP binding drives substrate translocation in a heterodimeric ABC transporter. eLife 9: e55943. doi:10.7554/eLife.55943

      Wang L, Johnson ZL, Wasserman MR, Levring J, Chen J, Liu S (2020) Characterization of the kinetic cycle of an ABC transporter by single-molecule and cryo-EM analyses. eLife 9: e56451. doi:10.7554/eLife.56451

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.

      Strengths:

      The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.

      Weaknesses:

      There are quite a few weaknesses, some related to the actual study and some more strongly related to the reporting about the study in the manuscript. The concerns are listed roughly in the order in which they appear in the manuscript.

      We truly appreciate your dedicating time and efforts to review our manuscript. Yes, we do perceive that those weaknesses you raised all make sense. We agree with you on almost all the suggestions that you detailed below, particularly in clarifying statistics and sample size determination. Please see specific responses below.

      Major Comments

      (1) In the introduction, the authors present procrastination nearly as if it were the most relevant and problematic issue there is in psychology. Surely, procrastination is a relevant and study-worthy topic, but that is also true if it is presented in more modest (and appropriate) terms. The manuscript mentions that procrastination is a main cause of psychopathology and bodily disease. These claims could possibly be described as 'sensationalized'. Also, the studies to support these claims seem to report associations, not causal mechanisms, as is implied in the manuscript.

      Thank you for this very practical suggestion. We agree that the current statements to underline the importance of procrastination are somewhat overreaching. Upon revision, we have overall toned down such claims by explicitly stating them as “associative evidence”, and rewritten a portion of terms in a more modest and balanced style. Please see specific revisions in the main text below:

      Introduction Section (Page 5, Line 64-81)

      “Procrastination is increasingly becoming a prevalent behavioral problem around the world, which reflects the irrational voluntary postponement of scheduled tasks albeit being worse off for such delays (Blake, 2019; Steel, 2007). In the epidemiological investigations, more than 15% of adults were identified as having chronic procrastination problems, and the situation for students was worse as 70-80% of undergraduates engaged in procrastination (American College Health Association, 2022; Ferrari et al., 2005). Moreover, the behavioral genetic evidence indicates a certain heritability of procrastination in human beings as well (Gustavson et al., 2017; Gustavson et al., 2014, 2015). In addition to its prevalence, the undesirable associations between procrastination behavior and health also warrant cautions. There is cumulative evidence to show the close associations between procrastination behavior and working performance, financial status, interpersonal relationships, and subjective well-being (Ferrari, 1994; Pychyl & Sirois, 2016; Steel et al., 2021). Further, as the prospective cohort studies indicated, many mental health problems emerge alongside procrastination, particularly in sleep problems, depression, and anxiety (Hairston & Shpitalni, 2016; Johansson et al., 2023). Even worse, chronic procrastination behavior has been observed to impair general health, as manifested by the intimate associations with close system disruption, gastrointestinal disturbance, as well as a high risk of hypertension and cardiovascular disease (Sirois, 2015; Sirois, 2016). ... ”

      (2) It is laudable that the study was pre-registered; however, the cited OSF repository cannot be accessed and therefore, the OSF materials cannot be used to (a) check the preregistration or to (b) fill in the gaps and uncertainties about the exact analyses the authors conducted (this is important because the description of the analyses is insufficiently detailed and it is often unclear how they analyzed the data).

      We are sorry to encounter a serious technical barrier making our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account (please see the screenshot below). This results in no access to all materials already deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report. We reckon that this may be triggered by my affiliation change to the Third Military Medical University of the People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” into the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the whole revised manuscript. Furthermore, we fully understand the gaps of comprehending the statistics of this study, resulting from inadequate methodological details in the reporting. Therefore, we have clearly reported extensive details in the Methods section to clarify how to conduct those analyses, favoring the smooth evaluations of our conclusions. Please see what we have added in the lines below (Comments #4-9).

      Methods Section (Page 5, Line 186-191)

      “This study fully adhered to CONSORT reporting guidelines, and was originally preregistered in the OSF repository (10.17605/OSF.IO/Y3EDT). However, due to the technical constraint related to OSF account service (see SM), this OSF page is no longer accessible. For transparency and best practices of open science, based on the original protocol documentations, a preregistration statement has been reconstructed to clarify aprior hypotheses, sample size determinations, and analysis plans for this study (Table S1).”

      (3) Related to the previous point: I find it impossible to check the analyses with respect to their appropriateness because too little detail and/or explanation is given. Therefore, I find it impossible to evaluate whether the conclusions are valid and warranted.

      Again, we apologize for confusing you because of inadequate statistical and methodological details. As you may know, this manuscript has ever been reviewed by Nature Human Behaviour, which editorially constrained the paper length. Thus, a substantial number of details had to be omitted or removed. As you kindly suggested, we have diligently added extensive descriptions to clarify how we carried out statistical analyses in the present study. Please see specific instances underneath.

      (4) Why is a medium effect size chosen for the a priori power analysis? Is it reasonable to assume a medium effect size? This should be discussed/motivated. Related: 18 participants for a medium effect size in a between-subjects design strikes me as implausibly low; even for a within-subjects design, it would appear low (but perhaps I am just not fully understanding the details of the power analysis).

      Thank you for raising this crucial question. We have determined this a priori effect size based on the existing work we published previously (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In our pilot study (Xu et al., 2023), we identified a significant interaction effect between the single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in the laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori. To clarify, we have explicitly justified the selection of this effect size in the Methods section.

      Methods Section (Page 5, Line 206-215)

      “A full randomized block design was used to assign participants to both groups (active neuromodulation group, NM; sham-control group, SC) (see Fig. 2C). As the pilot study probing into the effect of single-session tDCS stimulation to change procrastination willingness indicated (t = 2.38, p = .02, 95% CI [0.14, 1.49]; Xu et al., 2023), statistical power was predetermined by G*Power at a relatively medium effect size (1-β err prob = 0.80, f = 0.25), yielding the total sample size at 18 to reach acceptable power (see SM Methods and Fig. S1)....”

      We fully understand that this sample size to reach a medium effect size is seemingly low, and that the18 participants for each group are apparently limited in any case. Upon double-checking these power analyses, we confirmed that this sample size requirement is indeed correct. Please see the G*Power outputs in Author response image 1.

      Author response image 1.

      Despite the absence of algorithmic errors in the power analysis here, we are aware that this limited sample size may hamper statistical robustness. To tackle this weakness, we have clearly warranted such cautions in the Limitation section:

      Limitations Section (Page 12, Line 637-640)

      “... In addition to technical limitations, given the apparently limited size of the sample (total N = 46), it warrants caution in generalizing these findings elsewhere, and necessitates further validations in a large-scale cohort.”

      (5) It remains somewhat ambiguous whether the sham group had the same number of stimulation sessions as the verum stimulation group; please clarify: Did both groups come in the same number of times into the lab? I.e., were all procedures identical except whether the stimulation was verum or sham?

      Yes, we fully followed the CONSORT pipeline to carry out this double-blind trial, and thus confirmed that all the participants in both groups had the same number of stimulation sessions in our lab. That is to say, except for the stimulation type (verum vs sham), all the procedures, equipment and even the room were identical for all the participants. For clarification, we have clearly stated this in the main text:

      Results Section (Page 9, Line 419-423)

      “In both groups, almost all participants (93.2%, 41/44) reported perceiving acceptable pain stemming from current stimulation, and believed they were receiving treatment (91.30% (21/23) for active neuromodulation group (NM), 86.95% (20/23) for sham control group (SC), x<sup>2</sup> = 0.224, p = .636). All the participants were engaged in the identical experimental procedures excepting to stimulation’s type (active vs sham). ...”

      (6) The TDM analysis and hyperbolic discounting approach were unclear to me; this needs to be described in more detail, otherwise it cannot be evaluated.

      We apologize for the inadequate details, which hindered a precise understanding of the TDM and the hyperbolic discounting model. The Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations to take away from playing actions now for avoiding negative experiences). Once task aversiveness overrides the pursuit of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). Considering the nonlinear dynamics inherent in this hyperbolic discounting, we therefore employed a log-spaced temporal sampling scheme (Myerson et al., 2001) to strengthen curve-fitting performance (please see the schematic diagram (https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time)):

      Specifically, based on the log-spaced temporal sampling rule, five time points were first selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampling occurred at 10:00, 16:00, 18:00, 19:30, 20:00). At each time point, participants reported task aversiveness (A) on a 0–100 Visual Analog Scale (VAS). Then, task aversiveness discounting was calculated as 1- (A<sub>t</sub> / A<sub>earliest</sub>), where t<sub>earliest</sub> was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from these five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed as the trapezoidal integration of task aversiveness discounting over time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination. As you kindly suggested, we have added these details to explicitly clarify how to use the hyperbolic discounting approach for determining sampling time points and for calculating AUC of task aversiveness discounting.

      Methods Section (Page 6, Line 268-283)

      “On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives when performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a priori by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting, requiring ≥ 4 points (Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure.”

      Methods Section (Page 7, Line 318-334)

      “... As articulated temporal decision theoretical model above, the task aversiveness evoked by executing a task was temporally dynamic in a hyperbolic discounting pattern, with sharply discounting in faring away from deadline but slowly discounting in nearing deadline (Zhang & Feng, 2020). To quantitatively characterize the task aversiveness with consideration for its dynamics, the model-free area under the curve (AUC) was calculated. Specifically, based on the log-spaced temporal sampling rule, task aversiveness was measured by 100-point visual analog scale at the five sampling moments. Then, the task aversiveness discounting (A) was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point, serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), the AUC was computed as the trapezoidal integration between task aversiveness discounting and time across five data points, basing on the Myerson algorithm (Myerson et al., 2001). By doing so, a higher AUC reflects stronger temporal discounting of task aversiveness along with nearing deadline, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. As for the task outcome value, it was theoretically posited as a relatively stable evaluation of the task (Zhang & Feng, 2020; Zhang et al., 2021).”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (7) Coming back to the point about the statistical analyses not being described in enough detail: One important example of this is the inclusion of random slopes in their mixed-effects model which is unclear. This is highly relevant as omission of random slopes has been repeatedly shown that it can lead to extremely inflated Type 1 errors (e.g., inflating Type 1 errors by a factor of then, e.g., a significant p value of .05 might be obtained when the true p value is .5). Thus, if indeed random slopes have been omitted, then it is possible that significant effects are significant only due to inflated Type 1 error. Without more information about the models, this cannot be ruled out.

      Thank you for sharing this very timely and crucial comment. After careful scrutiny, we identified this statistical flaw you pointed out - each participant was not yet modeled as random slopes but as random intercepts merely. As you kindly suggested, we have reanalyzed all the statistics by adding random slopes (i.e., (1 + day|SubjectID)). Results showed a statistically significant interaction effect for both procrastination willingness (β = -7.8, SE = 1.8, DF = 45.6, p < .001) and actual procrastination rates (β = -7.4, SE = 2.4, DF = 46.6, p = .004), indicating the effectiveness of multi-session neuromodulation in mitigating procrastination. In the post-hoc simple effect analyses, participants who engaged in active neuromodulation (NM) showed a significant increase in task-execution willingness (i.e., decreased procrastination willingness; NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction) and a decrease in actual procrastination rates (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), while no such effects were identified for participants in the sham control group (for willingness, SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction; for actual procrastination, SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction). Taken together, we do appreciate your pointing out this definitely crucial statistical weakness, and have confirmed that our findings remain reliable after adjusting for Type 1 error by adding random slopes. Moreover, as you kindly suggested, we have incorporated these statistical details, particularly those concerning the GLMM, into the main text to facilitate your evaluation. Please see specific revisions below:

      Methods Section (Page 8, Line 381-401)

      “To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test....”

      Results Section (Page 9, Line 428-449)

      “To identify whether ms-tDCS targeting the left DLPFC can alleviate subjective procrastination willingness and actual procrastination behavior, a generalized linear mixed-effects model with Scatterthwaite algorithm was built, with task-execution willingness and actual procrastination rates (PR) as primary outcomes, respectively. For procrastination willingness, results showed a statistically significant interaction effect between multi-session neuromodulations and groups (β = -7.8, SE = 1.8, DF = 45.6, p < .001; Fig. 3A). In the post-hoc simple effect analysis, it demonstrated a significantly increased task-execution willingness (i.e., decreased procrastination willingness) after neuromodulation in the active neuromodulation group (NM-before: 35.65 ± 30.20, NM-after: 80.43 ± 19.92, t.ratio = 5.4, p < .0001, Tukey correction), but no such effects were identified in the sham control group (SC-before: 37.57 ± 26.46, SC-after: 47.35 ± 30.49, t.ratio =0.3, p = .77, Tukey correction) (Fig. 3B-C). A linear uptrend for task-execution willingness was further observed across multiple sessions in the active NM group, indicating gradually increasing neuromodulation effects (Fig. 3D; p < .01, Mann-Kendall test). For actual procrastination behavior, changes to actual procrastination rates across all the sessions have been detailed in the Fig. 3E. Similarly, a statistically significant interaction effect was identified here (β = -7.4, SE = 2.4, DF = 46.6, p = .004), and the simple effect analysis further revealed decreased actual procrastination rates after ms-tDCS in the active neuromodulation group (NM-before: 43.26 ± 39.09, NM-after: 0.00 ± 0.00, t.ratio = 5.1, p < .0001, Tukey correction), but no such prominent changes found in the sham control group (SC-before: 46.47 ± 40.75, SC-after: 33.34 ± 37.82, t.ratio = 0.7, p = .48, Tukey correction) (Fig. 3F-G). Also, a significant downtrend for procrastination rates across all the sessions was identified in the active NM group (Fig. 3H; p < .01, Mann-Kendall test).”

      (8) Related to the previous point: The authors report, for example, on the first results page, line 420, an F-test as F(1, 269). This means the test has 269 residual degrees of freedom despite a sample size of about 50 participants. This likely suggests that relevant random slopes for this test were omitted, meaning that this statistical test likely suffers from inflated Type 1 error, and the reported p-value < .001 might be severely inflated. If that is the case, each observation was treated as independent instead of accounting for the nestedness of data within participants. The authors should check this carefully for this and all other statistical tests using mixed-effects models.

      Thank you for underlining this very timely and helpful comment. As you correctly pointed out above, we did not include random slopes in the original GLMM, highly risking the inflation of the false-positive rate (i.e., Type-I error). By adding the random slopes, we reanalyzed all the statistics from the GLMM, and confirmed that all the findings are still reliable from those new GLMMs with random slopes. Again, thank you for this crucial statistical advice, and please see the above response for full details regarding what we have revised to address this comment you kindly raised.

      (9) Many of the statistical procedures seem quite complex and hard to follow. If the results are indeed so robust as they are presented to be, would it make sense to use simpler analysis approaches (perhaps in addition to the complex ones) that are easier for the average reader to understand and comprehend?

      We do thank you for this practical and helpful comment. In the original manuscript, we incorporated a joint model of longitudinal and survival data (JM-LSD), in conjunction with machine learning algorithms, to strengthen the robustness of our statistical findings. Nevertheless, we all agree with you on this point: there is no need to complicate the analyses by repeatedly probing the same research question to increase methodological robustness, at the expense of compromising readability and intelligibility for a broader audience. As you suggested, we have removed these complicated statistical methods, and merely maintained the primary ones - GLMM and X<sup>2</sup> cross-tab test, as well as a complementary one - Mann-Kendall linear trend test. Thus, we have almost rewritten the whole Results section. Please see the specific instances below:

      Results Section (Page 9, Line 468-485)

      “Ms-tDCS changes task aversiveness and task-outcome value

      Both task aversiveness and task outcome value serve as key pathways determining whether one would procrastinate. To this end, we further utilized a generalized linear mixed-effects model to examine the effects of ms-tDCS on changes in task aversiveness and task outcome value. Task aversiveness changes across all the sessions are shown in the Fig. 4A and 4C. We demonstrated a statistically significant decrease in task aversiveness and an increase in task outcome value via ms-tDCS in the neuromodulation group (Task aversiveness: interaction effect, β = -0.12, SE = 0.04, DF = 46.7, p = .002; simple effect, NM-before <sub>(AUC)</sub>: 1.13 ± 0.53, NM-after <sub>(AUC)</sub>: 1.95 ± 0.85, t.ratio = 4.5, p < .001, Tukey correction; Outcome value: β = -6.8, SE = 1.74, DF = 46.2, p < .001; simple effect, NM-before: 35.86 ± 27.82, NM-after: 73.08 ± 23.33, t.ratio = 5.0, p < .001, Tukey correction; see Fig. 4B), but not in the sham control group (Task aversiveness: SC-before <sub>(AUC)</sub>: 1.07 ± 0.51, SC-after <sub>(AUC)</sub>: 1.28 ± 0.46, t.ratio = 1.3, p = .20, Tukey correction; Outcome value: SC-before: 34.00 ± 25.17, SC-after: 40.13 ± 28.94, t.ratio = 0.8, p = .41, Tukey correction; see Fig. 4D). In the neuromodulation (NM) group, task aversiveness steadily decreased with the cumulative number of stimulation sessions, while perceived task outcome value increased significantly (see Fig. 4E-F, p < .05, Mann-Kendall test). Thus, it provides causal evidence clarifying that neuromodulation to left DLPFC reduces task aversiveness and enhances task-outcome value meanwhile.”

      Results Section (Page 10, Line 525-542)

      “Long-term effects of ms-tDCS

      We have also attempted to conduct a follow-up investigation to test the long-term retention of ms-tDCS in reducing actual procrastination. Almost all the participants had undergone follow-up except one in the neuromodulation group after last neuromodulation for 6 months (N<sub>NM</sub> = 22, N<sub>SC</sub> = 23). Thus, the GLMM was constructed, with the PR before first neuromodulation vs. PR after last neuromodulation for 6 months as covariates of interest. Results showed the statistically significant group*time interaction effects (β = 16.5, SE = 9.9, p = .049). Simple-effect model demonstrated a decrease in actual procrastination rates in the active neuromodulation group after last stimulation for 6 months compared to baseline (β = -22.05, SE = 10.0, p = .038, Tukey correction; NM-before: 40.68 ± 37.96, NM-after<sub>6-months</sub>: 18.63 ± 29.80), and revealed null effects in the SC group (β = 1.26, SE = 9.78, p = .99, Tukey correction; SC-before: 46.47 ± 40.75, SC-after<sub>6-months</sub>: 47.73 ± 39.18) (see Fig. 6).. Furthermore, using a nonparametric x<sup>2</sup> test to compare differences in the number of procrastinated tasks, we still found a statistically significant reduction in procrastination frequency in NM group after neuromodulation for 6 months compared to baseline (x<sup>2</sup> = 3.30, p = .035, NM-before: 68.19% (15/22), NM-after<sub>6-months</sub>: 40.91% (9/22)), while no significant changes were observed in the SC group (x<sup>2</sup> = 0.11, p = .74, SC-before: 69.56% (16/23), SC-after<sub>6-months</sub>: 73.91% (17/23)). Therefore, beyond to short-term effects, the benefits of ms-tDCS neuromodulation to reduce procrastination pose the long-term retention.”

      (10) As was noted by an earlier reviewer, the paper reports nearly exclusively about the role of the left DLPFC, while there is also work that demonstrates the role of the right DLPFC in self-control. A more balanced presentation of the relevant scientific literature would be desirable.

      We are grateful to you for noticing the unbalanced presentation of the literature on left DLPFC. As you kindly suggested, we have added literature to support the association between self-control and the right lateralization of the DLPFC. Please see below for what we have revised:

      Introduction Section (Page 4, Line 137-143)

      “...In addition to the left lateralization, there is solid evidence indicating significant associations between self-control and the right DLPFC indeed, particularly given that this region specifically functions in top-down regulation, future self-continuity representation and social decisions (Huang et al., 2025; Lin and Feng, 2024; Knoch & Fehr, 2007). Despite this case, Xu and colleagues demonstrated null effects of anodally stimulating the right DPFC to modulate either value evaluation or emotional regulation for changing procrastination willingness (Xu et al., 2023).”

      (11) Active stimulation reduced procrastination, reduced task aversiveness, and increased the outcome value. If I am not mistaken, the authors claim based on these results that the brain stimulation effect operates via self-control, but - unless I missed it - the authors do not have any direct evidence (such as measures or specific task measures) that actually capture self-control. Thus, that self-control is involved seems speculation, but there is no empirical evidence for this; or am I mistaken about this? If that is indeed correct, I think it needs to be made explicit that it is an untested assumption (which might be very plausible, but it is still in the current study not empirically tested) that self-control plays any role in the reported results.

      We truly appreciate your pointing out this weakness with regard to conceptualization. Yes, you are correct in understanding this causal chain: we conceptually speculate that the HD-tDCS stimulation over the left DLPFC operates self-control to change procrastination, rather than empirically validating this component in the chain: brain stimulation→increased self-control→increased task outcome value→decreased procrastination. In this causal chain, we did not collect data to directly measure self-control at either baseline or post-neuromodulation times. Therefore, we all agree with your suggestion to explicitly claim this case in the main text. Following this advice, we have redrawn a portion of the Conclusion by clearly pointing out the hypothesis-generating role of self-control in mitigating procrastination, and have further claimed this case in the Limitation section:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and offers a validated, theory-driven strategy for interventions.”

      Results Section (Page 10, Line 489-492 and 520-522)

      “Given the dual neurocognitive pathways identified above—reduced task aversiveness and increased task-outcome value—we proposed that these changes, conceptually driven by enhanced self-control via ms-tDCS over left DLPFC, account for how neuromodulation reduces procrastination. ...”

      “In summary, these findings demonstrated a mechanistic pathway underlying procrastination: the self-control that was conceptualized to be governed by left DLPFC mitigate procrastination by plausibly increasing task-outcome value.”

      Discussion Section (Page 13, Line 642-645)

      “Moreover, this study did not collect data for assessing participants’ self-control at either baseline or post-neuromodulation, thereby limiting our ability to determine whether the effects on procrastination were uniquely attributable to neuromodulation-induced changes in self-control. ...”

      (12) Figures 3F and 3H show that procrastination rates in the active modulation group go to 0 in all participants by sessions 6 and 7. This seems surprising and, to be honest, rather unlikely that there is absolutely no individual variation in this group anymore. In any case, this is quite extraordinary and should be explicitly discussed, if this is indeed correct: What might be the reasons that this is such an extreme pattern? Just a random fluctuation? Are the results robust if these extreme cells are ignored? The authors remove other cells in their design due to unusual patterns, so perhaps the same should be done here, at least as a robustness check.

      Thank you for raising this highly important and helpful comment. Indeed, we fully understand that this result is somewhat extraordinary, a fact that was equally striking to us when unblinding the data. After carefully scrutinizing the data and statistics, we are thrilled to confirm that this pattern is true. In support of this observation, we were gratified to receive numerous thank-you letters from participants who engaged in active neuromodulation. They expressed gratitude to us, and reported that they have substantially ameliorated procrastination behavior in real-life activities after completing the trial. While this does not constitute formal scientific evidence, we are also glad to see the benefits of this neuromodulation for those procrastinators.

      Two reasons could account for this pattern herein. One interpretation is to attribute this pattern to “scalar inflation”. In the present study, the procrastination rate was calculated as 1 minus the task-completion rate (e.g., 80%, 60%, 40%) by the deadline. At sessions # 6 and #7, all the participants completed their real-life tasks before the deadline, yielding a 0% (1 minus 100% completion rate) procrastination rate, without any between-individual variation. Thus, rather than there being no individual variation in procrastination, this scalar – the procrastination rate - is too insensitive to capture subtle differences per se. For instance, although participants #1 and #2 both showed a 0% procrastination rate - meaning that both completed their tasks before the deadline - Participant #1 might have completed it 3 hours before the deadline, whereas Participant #2 might have completed it only 10 minutes before. In this case, the “scalar inflation” emerges to let us perceive that both participants have equivalent procrastination rates, although participant #2 may have a higher procrastination level than #1. As conceptually defined in the field, procrastination is contextualized as “not completing a task before the deadline”. Thus, if this task is completed before the deadline, regardless of whether it was finished close to or far in advance of the deadline, this case is defined as “no procrastination”. In the present study, the primary outcome is whether a participant procrastinated on a real-life task before the deadline in real-world settings, irrespective of when she/he completed this task. Thus, this scalar - procrastination rate - fits our conceptualization of procrastination.

      Another reason is the potential accumulative effects from sequential multi-session tDCS stimulation. As shown in Mann-Kendall trend tests, the procrastination rates show a significant linear downtrend in the active neuromodulation group across sessions, even after removing sessions #6 and #7. This indicates that the improvements of going against procrastination may be sequentially accumulative along with the increase in sessions, implying a potential “dose-dependent effect”. Despite a speculative interpretation, this “dose-dependent effect” in neuromodulation has been well-documented in previous studies, showing the robustly linear association between the number of sessions and effectiveness (c.f., Cole et al., 2020; Hutton et al., 2023; Sabé et al., 2024; Schulze et al., 2018). Therefore, although this extreme pattern is somewhat extraordinary compared to previous observations, it makes sense.

      Yes, this is a definitely great idea to carry out a robustness check by removing sessions #6, #7, or both. We do believe that this analysis could support statistical robustness to go against potential biases from extreme cells. By doing so, we found that all the group*treatment_day interaction effects remained significant when removing either session #6 or session #7 (or even both, all p-values < .05), indicating high statistical robustness. Please see Supplementary table S3 and S4

      Taken together, in spite of their being extraordinary, we confirm that those findings are statistically robust to extreme outliers. As you kindly suggested, we have added those findings of the robustness check into the revised Supplemental Materials section.

      References

      Cole, E. J., Stimpson, K. H., Bentzley, B. S., Gulser, M., Cherian, K., Tischler, C., Nejad, R., Pankow, H., Choi, E., Aaron, H., Espil, F. M., Pannu, J., Xiao, X., Duvio, D., Solvason, H. B., Hawkins, J., Guerra, A., Jo, B., Raj, K. S., Phillips, A. L., … Williams, N. R. (2020). Stanford Accelerated Intelligent Neuromodulation Therapy for Treatment-Resistant Depression. The American journal of psychiatry, 177(8), 716–726. https://doi.org/10.1176/appi.ajp.2019.19070720

      Hutton, T. M., Aaronson, S. T., Carpenter, L. L., Pages, K., Krantz, D., Lucas, L., Chen, B., & Sackeim, H. A. (2023). Dosing transcranial magnetic stimulation in major depressive disorder: Relations between number of treatment sessions and effectiveness in a large patient registry. Brain stimulation, 16(5), 1510–1521. https://doi.org/10.1016/j.brs.2023.10.001

      Sabé, M., Hyde, J., Cramer, C., Eberhard, A., Crippa, A., Brunoni, A. R., Aleman, A., Kaiser, S., Baldwin, D. S., Garner, M., Sentissi, O., Fiedorowicz, J. G., Brandt, V., Cortese, S., & Solmi, M. (2024). Transcranial Magnetic Stimulation and Transcranial Direct Current Stimulation Across Mental Disorders: A Systematic Review and Dose-Response Meta-Analysis. JAMA network open, 7(5), e2412616. https://doi.org/10.1001/jamanetworkopen.2024.12616

      Schulze, L., Feffer, K., Lozano, C., Giacobbe, P., Daskalakis, Z. J., Blumberger, D. M., & Downar, J. (2018). Number of pulses or number of sessions? An open-label study of trajectories of improvement for once-vs. twice-daily dorsomedial prefrontal rTMS in major depression. Brain stimulation, 11(2), 327–336. https://doi.org/10.1016/j.brs.2017.11.002

      (13) The supplemental materials, unfortunately, do not give more information, which would be needed to understand the analyses the authors actually conducted. I had hoped I would find the missing information there, but it's not there.

      Sorry to offer uninformative supplemental materials (SM) in the original submission. As you suggested, we have added a substantial number of details to clarify how we conducted data analyses in the main text, and also tightened the whole SM section to improve readability and comprehensibility. We do hope that this revised manuscript could offer clear and adequate information in understanding methods and statistics for broader readers.

      In sum, the reported/cited/discussed literature gives the impression of being incomplete/selectively reported; the analyses are not reported sufficiently transparently/fully to evaluate whether they are appropriate and thus whether the results are trustworthy or not. At least some of the patterns in the results seem highly unlikely (0 procrastination in the verum group in the last 2 observation periods), and the sample size seems very small for a between-subjects design.

      Thank you for this very helpful summary. As you kindly suggested above, we have overhauled this manuscript to address those points that you listed here, particularly where we added relevant literature to balance our claims, added a huge amount of details to sufficiently/transparently report statistics, and conducted a robustness check to confirm the statistical robustness of our findings to those plausible extreme patterns (sessions #6 and #7), as well as justified how we determined this sample size fulfilling medium statistical power in a priori. Please see above for full details regarding how we addressed those comments, point-by-point.

      Reviewer #2 (Public Review):

      Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They report that tDCS effects on task-execution willingness and procrastination are mediated by task outcome value and claim that this neuromodulatory intervention reduces procrastination rates quantified by their task. Although the study addresses an interesting question regarding the role of DLPFC on procrastination, concerns about the validity of the procrastination moderate enthusiasm for the study and limit the interpretability of the mechanism underlying the reported findings.

      Strengths:

      (1) This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The approach is solid and aims to address an important question regarding the putative role of DLPFC in modulating chronic procrastination behavior.

      (2) The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.

      Thank you for taking your invaluable time to review our manuscript, warmly applauding the strength in research design and the conceptualization of scaling task aversiveness, as well as kindly sharing such helpful and insightful evaluations. As you correctly pointed out, we are aware of the absence of detailed, clear and understandable reporting of measures (e.g., real-world procrastination), statistics and methods, in the original manuscript. Following all your suggestions, we have thoroughly revised this manuscript to address those comments that you kindly made, point-by-point. Please see the full response underneath.

      Weaknesses:

      (1) The lack of specificity surrounding the "real-world measures" of procrastination is problematic and undermines the strength of the evidence surrounding the DLPFC effects on procrastination behavior. It would be helpful to detail what "real-world tasks" individuals reported, which would inform the efficacy of the intervention on procrastination performance across the diversity of tasks. It is also unclear when and how tasks were reported using the ESM procedure. Providing greater detail of these measures overall would enhance the paper's impact.

      We genuinely appreciate your raising this very crucial comment. We are sorry for omitting a tremendous number of methodological details to comply with the editorial requirement on the manuscript’s length, which hampered the comprehension of how we measure “real-life tasks” and “real-world procrastination”.

      As shown in the schematic diagram for experimental procedure (Fig. 1), the experimental protocol alternated between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On each Neuromodulation Day, participants received either active or sham HD-tDCS, and—critically—before stimulation—were instructed to specify a real-life task they were required to complete the following day, with a deadline between 18:00 and 24:00. This ensured ≥24 hours between neuromodulation and task execution, isolating offline after-effects. For instance, on Day #2 (Neuromodulation Day), before carrying out stimulation, participants were asked to report a real-life task that has a deadline within 18:00 - 24:00 for tomorrow’s “task day” (Day #3) (please see the schematic diagram in Author response image 2).

      Author response image 2.

      There are some real-life tasks that they reported in our experiment as examples: “Complete and submit a homework assignment”, “Complete a standardized English proficiency test”, “Complete an online course module required for applying a Class C driver’s license”, “Prepare slides for a seminar presentation”, “Practice guitar”, “Practice Chinese calligraphy”, and “Do the laundry”. Reported tasks spanned academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity.

      On each “task day”, participants engaged in an intensive Experience Sampling Method (iESM) protocol via a custom-built mobile app. Using this app, participants were required to report a subjective task-execution willingness score (i.e., a one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”; procrastination willingness = 100 – the task-execution willingness score), the subjective task aversiveness (i.e., a one-item 100-point visual analog scale), the subjective task outcome value (i.e., a one-item 100-point visual analog scale), and the objective procrastination rate, respectively.

      Rather than self-reported scores from those one-item visual analog scales, we asked participants to report real “task completion rate” for the objective quantification of the “real-world procrastination behavior”. Specifically, at the deadline, each participant was asked to report whether she/he had completed this task. If she/he reported not having yet completed the task (i.e. procrastination behavior emerged), she/he was further required to report the percentage of the task completed (1% - 99%), which was defined as the task completion rate. By doing so, we could calculate the real-world procrastination rate for the real-life task as the “1 – the task completion rate”. For instance, if a participant did not complete her/his real-life task before the deadline (i.e. she/he procrastinated this task) and reported completing 75% of this task at the deadline, her/his real-world procrastination rate was computed as the 25% (1 - 75%) (Please see the schematic diagram in Author response image 3).

      Moreover, rather than merely a self-reported task completion rate, each participant was also asked to upload proof (e.g., screenshots of submitted assignments, photos of printed documents, system timestamps) to the ESM digital system for validation.

      Author response image 3.

      To determine the sampling time points for this mobile app in the ESM, we capitalized on both the conceptual temporal decision model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001) (please see the schematic diagram in https://uen.pressbooks.pub/behavioraleconomics/chapter/the-reality-of-homo-sapiens, where each point indicates a sampling time):

      By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00). Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. As the primary outcomes, the procrastination rate (i.e., 1 – the task completion rate) and the procrastination willingness were sampled at the deadline point.

      Furthermore, yes, we fully concur with you on this great idea, that is, transparency about task diversity strengthens the generalizability of our findings. In response, we have tabulated these real-life tasks that were reported in this experiment in the independent Appendix 1, with automatic translations from Chinese to English via Qwen GPT. Please see below for what we have added to the main text:

      Methods Section (Page 6-7, Line 238-308)

      “Nested cross-sectional longitudinal design

      This study used a nested cross-sectional longitudinal design to investigate whether the multiple-session anodal HD-tDCS targeting the left DLPFC could reduce actual procrastination behavior and to probe how this effect manifests. To assess procrastination in daily life, we implemented a 15-day protocol alternating between Neuromodulation Days (Days 2, 4, 6, 8, 10, 12, 14) and Task Days (Days 1, 3, 5, 7, 9, 11, 13, 15). On the Neuromodulation days, the 20-min anodal HD-tDCS neuromodulation targeting the left DLPFC was performed for HD-tDCS active group at intervals of 2 days, while the sham-control group received sham HD-tDCS training. This HD-tDCS training was repeated for a total of seven sessions, and lasted 15 days (see Fig. 1a). Crucially, to capture procrastination in ecologically valid contexts, prior to receiving either active or sham HD-tDCS (administered between 09:00–18:00), participants were instructed to specify a real-life task they were personally obligated to complete the following day, with a self-defined deadline strictly constrained to 18:00–24:00 to ensure ≥24 hours between stimulation offset and task deadline, thereby isolating offline after-effects. This task should meet the following three criteria: (a) it should be already assigned in the real-world settings; (b) deadline should be constrained to 18:00-24:00 (see above); (c) it should be more likely to induce procrastinate. By doing so, more than 300 real-life tasks were collected, spanning academic (e.g., “submit a statistics homework assignment”), occupational (e.g., “draft and email a project proposal”), administrative (e.g., “complete online application for Class C driver’s license”), self-improvement (e.g., “practice guitar for ≥30 minutes”), domestic (e.g., “do laundry ”), and health-related (e.g., “running 2,000m for exercise”). Full task list has been tabulated in the Appendix 1. As primary outcomes, all the participants were required to reported task-execution willingness (TEW) (Zhang & Feng, 2020; Zhang, Liu, et al., 2019), for a real-life task 24 hours post-neuromodulation. Thus, procrastination willingness was quantified as 100-TEW score (see underneath for details). Furthermore, we asked participants to report the actual task completion rate (CR) of the task at the deadline (e.g. participant A finished 90% homework at deadline and reported this situation to us at deadline). In this vein, the actual procrastination rate (PR) was quantified as 1-CR.

      On the Task day, we developed a mobile app to implement experience sampling method (ESM) for tracking one’s real-time evaluation of task aversiveness and task outcome value (see Fig. 1). The task aversiveness describes how disagreeable one perceives performing a given real-life task to be, whereas outcome value refers to the subjective benefits of the task outcome brought about by completing the task before the deadline (Zhang & Feng, 2020). As theoretically conceptualized by the temporal decision model (TDM) of procrastination, the perceived task aversiveness is hyperbolically discounted when approaching deadline, showing sharply discounting when faring away from deadline but slowly discounting once nearing deadline (Zhang & Feng, 2020; Zhang et al., 2021). Thus, considering this nonlinear dynamics inherent in this hyperbolic discounting, the five recording moments of ESM were selected per task a prior by using a log-spaced temporal sampling scheme (Myerson et al., 2001), with increasing sampling density toward the deadline, such as moments of 10:00 (earliest), 16:00, 18:00, 19:30, 20:00 (deadline). The five sampling points could meet statistical prerequisite in the hyperbolic model fitting (requiring ≥ 4 points; Green & Myerson, 2004). To do so, recording moments of tasks were individually tailored for each task per participant in this ESM procedure. To obviate the confounds of daily emotions in task aversiveness evaluation, we used the averaged scores of PANAS at 10:00 (noon) and 16:00 (afternoon) as anchoring points to quantify one’s daily emotions by using this ESM app. Before each session of HD-tDCS training, each participant was required to report a real-life task whose deadline is tomorrow. To obtain the long-term effect of HD-tDCS (i.e., the interval between HD-tDCS and task completion is at least 24 hours), the task deadline that participants reported was required to be between 18:00 - 24:00. Once a sampling time reached, this app would send a digital message to require participants to fill online form for data collection.

      Quantification of covariates of interests

      Outcome variables of this study were twofold: one is task-execution willingness and another is procrastination rate (PR). Task-execution willingness is used to evaluate one’s subjective inclination to avoid procrastination (Zhang & Feng, 2020). In this vein, we used a 100-point scale to require participants to report their task-execution willingness (0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). This metric was recorded 24 hours after neuromodulation to examine its long-term effects. PR is used to quantify the extent to which one task has been procrastinated, and was calculated as 1 - CR (task completion rate). Critically, at the precise deadline, the app prompted participants to (a) indicate task completion status (yes/no), and if incomplete, (b) report the percentage completed (1–99%), defined as the Task CR, while simultaneously uploading objective evidence (e.g., screenshots of submitted files, photos of physical outputs, system-generated logs, or app-exported records). If the task was actually completed before the deadline, the CR would be 100% and the PR would be calculated as 0% (1-CR). PR was recorded at the actual task deadline for each participant. We were also interested in re-investigating their actual procrastination by using PR 6 months after the last neuromodulation to test the long-term retention of this neuromodulation effect.”

      References

      Myerson, J., Green, L., & Warusawitharana, M. (2001). Area under the curve as a measure of discounting. Journal of the experimental analysis of behavior, 76(2), 235–243. https://doi.org/10.1901/jeab.2001.76-235

      Xu, T., Zhang, S., Zhou, F., & Feng, T. (2023). Stimulation of left dorsolateral prefrontal cortex enhances willingness for task completion by amplifying task outcome value. Journal of experimental psychology. General, 152(4), 1122–1133. https://doi.org/10.1037/xge0001312

      Zhang, S., Verguts, T., Zhang, C., Feng, P., Chen, Q., & Feng, T. (2021). Outcome Value and Task Aversiveness Impact Task Procrastination through Separate Neural Pathways. Cerebral cortex (New York, N.Y. : 1991), 31(8), 3846–3855. https://doi.org/10.1093/cercor/bhab053

      Zhang, S., Liu, P., & Feng, T. (2019). To do it now or later: The cognitive mechanisms and neural substrates underlying procrastination. Wiley interdisciplinary reviews. Cognitive science, 10(4), e1492. https://doi.org/10.1002/wcs.1492

      Zhang, S., & Feng, T. (2020). Modeling procrastination: Asymmetric decisions to act between the present and the future. Journal of experimental psychology. General, 149(2), 311–322. https://doi.org/10.1037/xge0000643

      (2) Additionally, it is unclear whether the reported effects could be due to differential reporting of tasks (e.g., it could be that participants learned across sessions to report more achievable or less aversive task goals, rather than stimulation of DLPFC reducing procrastination per se). It would be helpful to demonstrate whether these self-reported tasks are consistent across sessions and similar in difficulty within each participant, which would strengthen the claims regarding the intervention.

      Thank you for raising this very crucial comment. We indeed agree with you on this point that the reported effects may vary with task difficulties and task-execution proficiency, which potentially confound the effects of stimulation on mitigating procrastination. As you correctly comment, given no data collection on difficulties or other relevant characteristics of tasks, we cannot completely rule out this confounder in interpreting our findings on the one hand. As a result, we have explicitly claimed this limitation in the Discussion section.

      On the other hand, despite no quantitative evidence, this risk of confounding main effects with disparities in task characteristics was controlled experimentally. As we reported above, all the reported tasks were mandated to meet three criteria: (a) they were already assigned in the real-world settings; (b) the deadline was constrained to 18:00-24:00; (3) they were likely to lead to procrastinate. To do so, each participant was clearly instructed to report a real-life task that was more likely to be procrastinated in real-world settings, and was not allowed to report easy, achievable and cost-less tasks. Supporting this case, those reported tasks were found spanning academic (e.g., submitting an assignment), occupational (e.g., preparing a presentation), administrative (e.g., applying for a license), self-improvement (e.g., practicing guitar for ≥30 min), domestic (e.g., laundry), and health-related domains (e.g., running ≥ 2,000m for exercise), indicating a plausible task diversity and difficulty. This was resonated by observing the high within-subject task homogeneity. For instance, for Participant #5, she/he reported the tasks that were almost all around academic activities across all the sessions. Therefore, as the task list reported (please see Appendix 1), these self-reported tasks were plausibly consistent across sessions and similar in difficulty within each participant.

      In addition, as we tested, almost all the participants reported they were receiving treatment, with 91.30% (21/23) for the active neuromodulation group (NM) and with 86.95% (20/23) for the sham control group (SC) (x<sup>2</sup> = 0.224, p = .636), indicating the effectiveness of the double-blinding methods. If participants learned across sessions to report more achievable or less aversive task goals, their procrastination willingness and procrastination rates for their reported tasks would all increasingly decrease, irrespective of whether they were in the active neuromodulation-effect group or the sham group. However, no such effects - procrastination willingness and procrastination rates for their reported tasks increasingly decreasing across sessions - existed in the sham control group (Mann-Kendall test, for procrastination willingness, tau = 0.60, p = .13; for procrastination rate, tau = 0.61, p = .13), indicating no statistically significant learning effect or strategic effect on task performance. Again, thank you for this very crucial comment, and we do hope these clarifications could address it.

      Limitations Section (Page 12, Line 637-640)

      “In addition, despite instructing to report valid real-life tasks with high probabilities to procrastinate, we had not yet measured the task difficulty and consistency across sessions for each participant. Consequently, interpreting the effects of neuromodulation to mitigate procrastination as “unique contributions” should warrant cautions. ...”

      (3) It would be helpful to show evidence that the procrastination measures are valid and consistent, and detail how each of these measures was quantified and differed across sessions and by intervention. For instance, while the AUC metric is an innovative way to quantify the temporal dynamics of task-aversiveness, it was unclear how the timepoints were collected relative to the task deadline. It would be helpful to include greater detail on how these self-reported tasks and deadlines were determined and collected, which would clarify how these procrastination measures were quantified and varied across time.

      We do appreciate your highlighting the importance of clarifying how to measure procrastination, substantially helping readers to interpret these findings. As reported above, the primary outcomes of this experiment included subjective procrastination willingness and objective actual procrastination rate. For the subjective procrastination willingness, using the purpose-built mobile app, participants were required to report subjective task-execution willingness score (i.e., one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”). Thus, the procrastination willingness was computed as “100 – the task-execution willingness score”. For the objective procrastination rate, rather than self-reported scores from those one-item visual analog scales, we asked participants to report the real “task completion rate from 1% to 99%” for the objective quantification of the “real-world procrastination behavior”. Full details can be found in Response #1.

      For determining sampling time points for the quantification of AUC, we capitalized on both the conceptual Temporal Decision Model and the statistical Myerson algorithm. Specifically, the Temporal Decision Model (TDM) was originally proposed by our team (Xu et al., 2023; Zhang et al., 2019, 2020, 2021), which theoretically conceptualizes procrastination as the failure of the trade-off between task outcome value (i.e., motivation to take actions now for pursuing task reward) and task aversiveness (i.e., motivations for avoiding taking action now for avoiding negative experiences). Once task aversiveness overrides the pursuits of task outcome values, the procrastination emerges. One overarching hypothesis in this theoretical model is that the task aversiveness is hyperbolically discounted when approaching the deadline: it would be discounted sharply when being far from the deadline but discounted slowly when nearing the deadline (Zhang et al., 2019). To maximize statistical power to fit dynamic motivational curves, we employed a log-spaced temporal sampling scheme (Myerson et al., 2001). By this fitting algorithm (Myerson et al., 2001), five time points were selected to fulfill the statistical prerequisites for hyperbolic model fitting, with increasing sampling density toward the deadline (e.g., for a task due at 20:00: sampled at 10:00, 16:00, 18:00, 19:30, 20:00).

      Once the task-specific five sampling time points were determined per participant, this mobile app sent a digital message to ask her/him to immediately report the task aversiveness and the task outcome value then. After capturing the task aversiveness from those five time points, the task aversiveness discounting was calculated as 1- (A(t) / A(earliest)), where t(earliest) was the earliest sampling point (e.g., 10:00), serving as the reference for immediate execution. Subsequently, using the GraphPad Prisma software (v9, 525), we estimated the AUC from those five data points based on the Myerson algorithm (Myerson et al., 2001), which was computed via the trapezoidal integration between task aversiveness discounting and time. By this modelling method, a higher AUC reflects stronger temporal discounting of task aversiveness, which means that participants experience a faster decline in subjective aversiveness as execution is delayed, yielding lower effective aversiveness and reduced avoidance behavior. That is to say, if a participant showcases a greater discounting of task aversiveness as reflected by a higher AUC, she/he experiences a more pronounced reduction in subjective aversiveness upon postponement, plausibly yielding less procrastination.

      Taken together, following your suggestion, we have added a substantial number of details to clarify how to measure procrastination, when to sample the data and how to estimate the AUC into the revised manuscript. Please see them in Response #1.

      (4) There are strong claims about the multi-session neuromodulation alleviating chronic procrastination, which should be moderated, given the concerns regarding how procrastination was quantified. It would also be helpful to clarify whether DLPFC stimulation modulates subjective measures of procrastination, or alternatively, whether these effects could be driven by improved working memory or attention to the reported tasks. In general, more work is needed to clarify whether the targeted mechanisms are specific to procrastination and/or to rule out alternative explanations.

      Yes, we fully agree with you on this consideration: we should tone down the conclusions currently claimed in the main text, given the inherent shortcomings mentioned above. As you helpfully suggested, we have moderated our overall claims regarding the effects of multi-session neuromodulation in alleviating chronic procrastination. Please see specific instances below:

      Abstract Section (Page 2, Line 55-57)

      “... This establishes a precise, value-driven neurocognitive pathway to account the conceptualized roles of self-control on procrastination, and potentially offers a validated, theory-driven strategy for interventions.”

      Conclusion Section (Page 13, Line 657-664)

      “In conclusion, this study potentially provides an effective way to reduce both procrastination willingness and actual procrastination behavior by using neuromodulation on the left DLPFC. Furthermore, such effects have been observed for 2-day-interval long-term after-effects, and were also found for 6-month long-term retention in part. More importantly, this study identified that the ms-tDCS neuromodulation could decrease task aversiveness and increase task outcome value while, and further demonstrated that the increased task outcome value could predict decreased procrastination, a relationship conceptually driven by enhancing self-control. In this vein, the current study enriches our understanding of neurocognitive mechanism of procrastination by showing the prominent role of increased task outcome value in reducing procrastination. Also, it may provide an effective method for intervening in human procrastination.”

      Moreover, yes, as we clarified above, in addition to the objective measure of procrastination behavior, we also leveraged a one-item visual analog scale (i.e. one-item 100-point visual analog scale, “How willing are you to do this task?”, 0 for “I will definitely procrastinate this task” and 100 for “I will take action to complete this task immediately”) to measure subjective procrastination willingness. Results demonstrated that the subjective procrastination willingness significantly decreased across neuromodulation sessions in the active group, but not in the sham control group, consistent with the observed reduction in the objective procrastination measure. In addition, we all perceive it as helpful and crucial to note that we cannot draw the conclusion that the effects of neuromodulation on mitigating procrastination are contributed by increasing task outcome value uniquely. Given no measures or evidence of other factors, such as working memory and attention, we cannot rule out other neurocognitive pathways. To address this point, we have removed or rephrased such statements throughout the whole revised manuscript, and explicitly constrained to interpret this neurocognitive mechanism (i.e., increased task outcome value) within the theory-driven framework of the temporal decision model.

      Reviewer #3 (Public review):

      This manuscript explores whether high-definition transcranial direct current stimulation (HD-tDCS) of the left DLPFC can reduce real-world procrastination, as predicted by the Temporal Decision Model (TDM). The research question is interesting, and the topic - neuromodulation of self-regulatory behavior - is timely.

      Many thanks for kindly dedicating time to review our manuscript, and for the helpful comments detailed below. Thank you for appreciating the novelty of this study.

      However, the study also suffers from a limited sample size, and sometimes it was difficult to follow the statistics.

      Thank you for pointing out these crucial concerns. As you correctly raised, the sample size is somewhat small in any case, but we confirm that this sample size is adequate to obtain medium statistical power.

      For estimating the sample size, we determined the a priori effect size based on the existing work we published (Xu et al., 2023, J Exp Psychol Gen;152(4):1122-1133). In this pilot study, we identified a significant interaction effect between single-session tDCS stimulation (active vs sham) and time (pre-test vs post-test) (t = 2.38, p = .02, n = 27; 95% CI [0.14, 1.49]) for changing procrastination willingness in laboratory settings, indicating a medium effect size. Therefore, this pilot study provides supportive evidence to determine this effect size a priori.

      Using the GPower software with an estimation of a medium effect size, we determined that a total sample size of N<sub>total</sub> = 34 could reach adequate statistical power. Please see outputs of the GPower in Author response image 1.

      As for the statistics, we genuinely acknowledge that the vague methodological descriptions and complex algorithms indeed complicated the understanding of the methods and statistics. To address this, echoing the comment raised by Reviewer #1, we have removed the complicated statistics and methods, and further clarified how we used the generalized linear mixed-effect model (GLMM) for statistical analysis. Please see the specific revisions below:

      Methods Section (Page 8, Line 378-403)

      “Statistics

      All the statistics were implemented by R (https://www.rstudio.com/) and R-dependent packages.

      To clarify whether multiple-session HD-tDCS neuromodulation can reduce procrastination, the generalized mixed-effects linear model (GLMM) was constructed with full factorial design for subjective procrastination willingness (i.e., self-reported visual analog scores) and actual procrastination behavior (i.e., real-world task-completion rate before deadline). Here, sex, age and socioeconomic status (SES) were modeled as covariates of no interest. As the National Bureau of Statistics (China) issued (https://www.stats.gov.cn/sj/tjbz/gjtjbz/), on the basis of per capita annual household income, the SES was divided into seven hierarchical tiers from 1 (poor) to 7 (rich). To obviate subjective rating bias stemming from individual daily mood, we separately measured participants’ daily emotional fluctuation at 10:00 and 16:00 using a self-rating visual analog item (i.e., “How do feel for your mood today?”, 0 for “completely uncomfortable” and 100 for “definitely happy”). By doing so, the averaged score of those self-rating emotions at the two time points was modeled into the GLMM as covariate of no interests, yielding the final expression of “outcome ~ Group*Treatment_Day + Age + Gender + SES + Emotions + (1 + Treatment_Day | SubjectID)” in the statistical model”. This analysis was implemented using the “lme4” and “lmerTest” packages. Employing “emmeans” package, simple effects were also tested at baseline and post-last-intervention using Tukey-adjusted pairwise comparisons of estimated marginal means from the full GLMM, controlling for covariates and random-effects structure. To validate statistical robustness, instead of continuous outcomes for parametric tests, we also conducted a between-group comparison for the number of tasks that procrastination emerges by using the nonparametric x<sup>2</sup> test with φ correction or Fisher exact test. Regarding the 6-month follow-up investigation, this GLMM was also built to examine the long-term retention of neuromodulation on reducing actual procrastination.”

      The preregistration and ecological design (ESM) are commendable, but I was not able the find the preregistration, as reported in the paper.

      We are sorry to encounter a serious technical barrier that has rendered our preregistration invisible and inaccessible. The OSF has disabled my OSF account, as it claimed to detect “suspicious user’s activities” in my account. This has prevented access to all materials deposited in this OSF account, including this preregistration. We have contacted the OSF team, but received no valid technical solution to recover this preregistered report (please see the screenshot below). We reckon that this may be due to my affiliation change to the Third Military Medical University of People’s Liberation Army (PLA).

      To address this unexpected circumstance and to ensure transparency, we have explicitly reported this case in the main text, and added the “Reconstructed Preregistration Statement” to the Supplemental Materials (SM). Also, as it has been out of best practices in preregistration, in addition to transparently reporting this case, we have removed this statement regarding preregistration elsewhere throughout the revised manuscript.

      Overall, the paper requires substantial clarification and tightening.

      We are grateful for your evaluation, and we fully agree with you. In response, we have added a tremendous number of details to clarify how to measure procrastination, how to conduct the statistical analyses, and how to collect real-life tasks, as well as other experimental materials. Please see the revisions in the Methods section of the revised manuscript. Again, thank you for those helpful suggestions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) In the Supplemental Materials, page 4, lines 163 to 167 seem to be from a different manuscript (as the section talks about neural markers, significant clusters, and brain networks).

      We are sorry for erroneously embedding this irrelevant section here. We have removed it, and have double-checked the document to avoid such mistakes.

      (2) I'm no expert here, but some of the trace and density plots in the SOM look problematic (e.g., Figure S5 top panel). But it's not made clear to which model/analysis these plots belong, so they are not very helpful without that information.

      Thank you for bringing these potentially problematic plots to our attention. Following your great suggestion, these results have been removed from the SM to amplify readability and comprehensibility.

      (3) Table S1 reports side effects "from the neurostimulation" (this is also the language used in the main manuscript), but having the flu is rather unlikely to be a side effect from the stimulation, isn't it? Thus, this language is highly confusing, and when reading the main text, it's not clear that these are just life events that are most likely unrelated to the stimulation, but have the potential to affect the measured variables (i.e., ultimately, they seem a source of noise).

      We apologize for this confusing wording. Here, the “side effects” are defined as confounding effects deriving from unexpected life events that uncontrollably disrupt task execution and task performance, such as “having the flu”, or “an unexpected mandatory CCP (Communist Party of China) meeting assignment”. To obviate misunderstanding, we have rephrased “side effects” as “unexpected life events disrupting task execution” in both the main text and the SM section both.

      (4) The use of the English language could be improved.

      Thank you for your very practical suggestion. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) It would be helpful to include greater detail about the ESM procedure and details of the self-reported tasks. This would help rule out potential confounds of difficulty or learning (e.g., participants may have learned to identify more achievable and less difficult tasks across the sessions, which would mean they are learning to perform the task better rather than to procrastinate less). Further elaboration on the quantification of procrastination measures would help clarify the mechanism underlying this behavior, which is important for clarifying how these effects arise and what aspect of procrastination behavior is being targeted by the tDCS intervention (and rule of alternative explanations).

      We wholeheartedly appreciate your sharing this very crucial recommendation. As we mentioned above, we fully followed your helpful suggestions, particularly by adding massive details to fully report how to collect real-life tasks (with consistent and plausible difficulty across sessions), how to determine sampling time points, and how to quantify metrics (e.g., subjective procrastination willingness score, objective procrastination rate, AUC of task aversiveness, and task outcome value) to the revised manuscript. We do believe that these revisions and clarifications are imperative and necessary. By including these details, we do believe that the readability and clarity have been substantially improved in the current form. Please see the specific revisions and clarifications above.

      (2) It would be helpful to proofread for grammatical and spelling typos (e.g., DLPFC is spelled incorrectly in line 140, Satterwaite is spelled incorrectly in Line 415).

      Thank you for your kind suggestion. Both spelling typos have been corrected, and we have double-checked the revised manuscript to ensure no such typos remain. As you kindly suggested, we have invited a proofreading editor to edit and polish the English of the revised manuscript.

      (3) Please clarify in Figure 4 that a higher AUC is associated with lower task aversiveness (which is stated in the methods but not clearly in the figure).

      Many thanks to you for your helpful suggestion. As you kindly suggested, we have clarified this case in the figure legend.

      Reviewer #3 (Recommendations for the authors):

      I want to see the preregistration.

      Thank you for your helpful recommendation. As we replied above, a serious technical issue on OSF occurred, making our preregistration invisible and inaccessible. OSF has disabled my account, claiming to detect “suspicious user’s activities” in my account. As a result, there is no access to all materials that were already deposited in this OSF account, including this preregistration. We have reconstructed this preregistration based on archived documents, and reported it in the SM. As we reported above, although this partially addresses the problem, it no longer fulfills the best practices of preregistration. Consequently, in addition to transparently reporting this case, we have removed all the preregistration statements throughout the revised manuscript.

    2. Reviewer #2 (Public review):

      Summary:

      Chen and colleagues conducted a cross-sectional longitudinal study, administering high-definition transcranial direct stimulation (HD-tDCS) targeting the left DLPFC to examine the effect of HD-tDCS on real-world procrastination behavior. They find that seven sessions of active neuromodulation to the left DLPFC elicited greater modulation of procrastination measures (e.g., task-execution willingness, procrastination rates, task aversiveness, outcome value) relative to sham. They show that HD-tDCS reduces task aversiveness and increases task-execution willingness on real-world tasks as quantified by intensive experience sampling methods, providing causal evidence for the role of DLPFC in modulating contextual features to delaying or completing one's goals.

      Strengths:

      • This is a well-designed protocol with rigorous administration of high-definition transcranial direct current stimulation across multiple sessions. The intensive experience sampling approach which probes and assesses self-relevant task goals is innovative and aims to address an important question regarding the specific role of DLPFC in modulating specific features of chronic procrastination behavior (e.g., task-execution willingness, task aversiveness).

      • The quantification of task aversiveness through AUC metrics is a clever approach to account for the temporal dynamics of task aversiveness, which is notoriously difficult to quantify.

      Weaknesses:

      • While the findings that neurostimulation reduces procrastination behavior is compelling, there remain several alternative interpretations for these effects. For example, it could be that the task-execution willingness isn't increased per se, but rather that the goal completion becomes more valuable as participants learn from feedback or become more aware of their successful attainment of or failure to complete task goals. It is unclear whether the effects could be driven by improved working memory or attention to the reported tasks (and this limitation is addressed by the authors). In short, it is also difficult to examine the temporal dynamics of how these goals are selected across time.

      • It is unclear whether the current evidence support long-retention of this neurostimulation intervention. The study includes one 6-month timepoint after the study to examine the long-term retention of the neural stimulation effect. Future studies that evaluate the long-term effects across multiple time points would strengthen the evidence for the robustness of this intervention.

    3. eLife Assessment

      This valuable cross-sectional longitudinal study leverages high-definition transcranial direct current stimulation to the left dorsolateral prefrontal cortex to examine its effect on procrastination behavior over an extended time span. The cross-sectional longitudinal study provided evidence for how stimulating DLPFC impacts reveal-world procrastination behavior. Support for the conclusions is incomplete owing to missing information about the analyses, and results, as well as some potential alternative interpretations.

    4. Reviewer #1 (Public review):

      Summary:

      The authors report the results of a tDCS brain stimulation study (verum vs sham stimulation of left DLPFC; between-subjects) in 46 participants, using an intense stimulation protocol over 2 weeks, combined with an experience-sampling approach, plus follow-up measures after 6 months.

      Strengths:

      The authors are studying a relevant and interesting research question using an intriguing design, following participants quite intensely over time and even at a follow-up time point. The use of an experience-sampling approach is another strength of the work.

      Comments on revisions:

      Overall, I think the authors made many improvements to their manuscript. There are, however, still a number of concerns that first need to be addressed, since it is still not currently possible to fully evaluate the analyses, results, and conclusions presented in the paper. I list these points below:

      (1) The authors still use causal language where they must not use causal language. This is true for many places in the manuscript; I am highlighting here just a few places, but the authors nevertheless have to go carefully through the whole manuscript to change these instances.

      Some examples:

      (a) In response to my comment (1) in the previous round, where the authors adjusted their text, the authors still use causal language in their last sentence "... procrastination behavior has been observed to impair general health..." Unless the cited study truly allowed causal conclusions, the causal language should be removed here as well.

      (b) The authors still make (causal) claims about the involvement of self-control in their observed results. To reiterate from the previous round of revisions: The authors cannot make any strong claims about the role of self-control processes because they do not directly measure self-control nor do they directly manipulate self-control or have a design that would rule out alternative mechanisms other than self-control. Therefore, their claims about self-control have to be toned down. It is laudable that the authors have added a statement towards the end of their discussion about not being able to make strong conclusions about the role of self-control. But the authors need to use similar careful wording not just at the end of the discussion but throughout the manuscript.

      (i) In the abstract, the authors use the formulation "...conceptualized roles of self-control on procrastination..." -- this wording is still too strong, suggesting that you actually studied self-control.

      (ii) In the introduction (page 4, lines162-169), the way the authors formulate these sentences suggests that they directly measured self-control. Again, the authors need to make it explicit that they are not directly measuring self-control but its hypothesized down-stream consequences on valuations/behavior.

      (iii) In the discussion, for example, on page 11, lines 555 and following, the authors write:

      "One major contribution this study has made is to disentangle the neurocognitive mechanism of procrastination by demonstrating that self-control could increase task-outcome value so as to reduce procrastination."

      Again, please be aware that you are NOT demonstrating that self-control does anything, since you only measure procrastination rates, outcome values, and task aversiveness. It is possible that mechanisms other than self-control might be relevant for this. Perhaps neuromodulation directly increases outcome values, without involvement of self-control processes. You simply cannot know that and therefore you cannot make those claims in the form that you are making them. You can write that the observed results are consistent with the idea that neuromodulation might have had an effect on self-control and this in turn might have affected outcome values. But you also need to make it explicit that, to substantiate these claims, you would need more direct evidence that indeed self-control was involved. These more careful formulations would not at all reduce the value of your work, but indeed they would rather demonstrate your carefulness in interpreting the results you obtained.

      (2) I am still puzzled by the power analysis. In the text, you write that a sample size of 18 participants (i.e., 9 per group) would be sufficient to achieve 80% power. I still feel this seems far too optimistic and hard to believe, but that is not my point here. While in the text, you write that you need 18 participants, the G*power output seems to suggest a sample size of 34, not 18. Why this contradiction? Or is it not contradictory? If it is not, then please explain it more fully.

      (3) I have several comments about the mixed-effects analysis.

      First of all, I want to thank the authors for adding more details, things have become much clearer now. However, I still have a few questions and comments related to these analyses:

      (a) The variable Emotions was within-subjects, as far as I understood. Accordingly, Emotions should most likely be modelled with random slopes varying over participants (in addition to being modelled as a fixed effect).

      (b) The analyses still cannot fully be evaluated as I cannot access the scripts and data. The authors mention that the scripts and data should be available via a link they provide (https://doi.org/10.57760/sciencedb.35140). However, when I try to access these materials via this link, no page opens; it seems the link is dead?

      (c) What are the results and conclusions if you do not include the covariates of no interest? I.e., please re-run your main models without age, gender, SES, Emotions.

      (d) The authors mention that they use GLMMs, which would suggest generalized mixed-effects models, but they do not describe what family/distribution they used. Since they mention lmerTest and seem to report F-tests, my guess is that they used Gaussian models. However, both their DVs (procrastination rates and their ratings) are bounded variables and at least procrastination rates hit the lower boundary. That can mean that their analyses suffer from inflated Type 1 and/or Type 2 rates. Therefore, please repeat the analyses with an appropriate generalized mixed-effects model (perhaps a beta regression type of model?).

      (e) When reporting the results of the mixed-effects models, the authors report the regression coefficient, standard error, DFs and p value, but not the actual test statistic. Please add the information about the test statistic and report all degrees of freedom (in case of F tests that would be the degrees of freedom of the test and the residual degrees of freedom).

      (f) Thank you for adding the analysis where you remove the last two sessions. But currently you present them in the manuscript without explaining/motivating why you do this. Please add this motivation, as otherwise it will be puzzling for the reader why you conduct these analyses.

      (4) Mediation analysis

      In your manuscript, you present some mediation analyses. Please be aware that such mediation analyses cannot establish causality and they suffer from extremely high Type 1 error rates (see, e.g., https://datacolada.org/103).

      My suggestion would be to completely remove all mediation analyses. However, if you want to keep them, then you need to be extremely careful in how you present the results. You need to explicitly mention that you cannot derive any causal conclusions from them and that simulation studies have shown that such mediation analyses suffer from extremely high Type 1 errors.

      As an example (but the mediation results are mentioned in several places, for example, also in the abstract):

      On page 10, lines 501-503: What you can causally conclude is that neuromodulation affects your measured variables (outcome values, procrastination rates, task aversiveness), but you cannot conclude that the effect of neuromodulation on procrastination rates causally operates via outcome values. Thus, please adjust the formulation accordingly. The same applies to the mediation section that follows right afterwards (page 10, lines 505-522).

      (5) In the introduction, the authors introduce several theoretical procrastination frameworks (TMT, mood repair, TDM). Do the results of the current paper help to decide which framework might be the most appropriate, at least for the authors data set? It might be of interest to address this explicitly.

      (6) The language is sometimes hard to understand and seems in quite some places grammatically incorrect. Thus, I think the paper would profit very much from thorough English proofreading.

    1. eLife Assessment

      This study provides direct and compelling evidence that lamellipodial protrusions dynamically adjust Arp2/3 complex incorporation in response to mechanical counterforces, while also modulating cellular responsiveness to upstream signals like Rac GTPase. By combining endogenous labeling, live-cell imaging, and optogenetic signaling activation, the work demonstrates how adhesion state and physicochemical perturbations reproducibly alter branched actin organization, offering a fundamental advance over previous works. The findings deliver significant insights that will resonate broadly with cell biologists and biochemists studying actin dynamics and mechanotransduction.

    2. Reviewer #1 (Public review):

      Summary:

      This is an interesting study describing intensity changes of lamellipodial Arp2/3 complex incorporation dependent on the substratum the cells are spreading on (PLL vs fibronectin), but also on manipulation of either contractility or osmotic pressure or even external mechanical load exerted onto cells, e.g., by increasing medium viscosity. The authors use quite fancy cell systems for their studies, first of all, a CRISPR-engineered fibroblast cell line in which both endogenous loci of the Arp2/3 complex subunit Arpc2 are tagged with mScarlet, but at the same time, conditionally removable using tamoxifen. These lines, optionally also harboring Pxn-GFP and Lifeact-miRFP670, have previously been described by the authors (Chandra et al, 2022, PMID: 34861242). In addition, they use cells allowing local photoactivation of Rac signalling through a Tiam1 activation module combined with Halo-tagged Arpc2, apparently stably co-expressed in tamoxifen-treated Arpc2-KO fibroblasts. These cells may or may not have been published previously.

      Overall, the study provides convincing evidence that Arp2/3 complex accumulation in the lamellipodium negatively correlates with its width and perhaps the mechanical load these actin networks are exposed to at the leading edge membrane, shown initially through allowing cells to spread on substrates in which the formation of integrin-based adhesions is poor (PLL) or stimulated (through fibronectin). In the latter case, lamellipodia are comparably narrow, perhaps reasonably well clutched, and thus feel sufficient counter-force at the leading edge membrane to build a dense, Arp2/3-dependent actin network. Albeit interesting and important to show as the authors did, these results are not entirely surprising given the literature published on actin remodeling in cells in conditions similar to those used by the authors (i.e., on PLL). Thus, the results should be better embedded into the context of this previous literature to more precisely reveal which aspects are new and interesting and which ones are more or less intuitive and expected.

      However, the authors also show yet another result, which is quite spectacular indeed, revealing dramatic local protrusion of a Rac-dependent lamellipodium on PLL only in the presence of methylcellulose, but not on PLL alone. Although the authors cannot fully explain the mechanisms causing these results, they are thought-provoking and will certainly stimulate future, relevant research on this topic and new insights. Altogether, I think this is an interesting study that can be shared rapidly, given that the authors provide more experimental detail and transparency concerning their used cell model systems. Aside from a few other suggestions for amendments and corrections, I would also recommend citing classical literature that has provided the basis for the interpretation of the results shown here, as specified below.

      Specific criticism and comments:

      (1) I feel the paper is interesting for actin remodeling and Arp2/3 complex aficionados, but quite difficult to read and to understand in places for non-experts in the field, so I think the text requires more detailed explanation of specific terms, model systems used, and overall correction of either grammatical or semantic errors, or colloquial language.

      (2) In general, I think the characterization of Arp2/3 complex incorporation into the lamellipodia of cells spreading on PLL versus FN is interesting, as it has not been done previously in such a systematic fashion to my knowledge. However, I think the authors could emphasize better how this relates to previously established structural features of actin filament networks, published on PLL. So more than 3 decades ago, Hotchin & Hall published clear evidence that starved fibroblasts can only form focal complexes or adhesions downstream of PDGF or LPA-stimulation if seeded on FN, but not on PLL (see Figure 1 in PMID: 8557752). Around the same time, Flinn and Ridley showed this virtual absence of classical, Rac-dependent focal complexes to be accompanied by the formation of beautiful, broad lamellipodia (see Fig. 1A in PMID: 8743960), which only formed in the absence of excess RhoA activity and thus contractility by the way (see also below). A few years later, Small et al summarized all these phenotypes in a comprehensive review and also showed that cells on PLL (similar to the rapidly migrating keratocytes) combined large, flat lamellipodia with tiny, nascent adhesions scattered throughout these structures (see Figure 2 in PMID: 10047522). These authors also noted that the sole inhibitor-mediated reduction of contractility could switch FN-phenotypes with narrow, ruffling lamellipodia and peripheral focal complexes back to a PLL-type phenotype of broad lamellipodia (see Figure 1 in PMID: 10047522). In the following decade then, different labs (Verkhovsky, Bershadsky, Vavylonis, Watanabe et al) showed beautiful phase contrast or fluorescence movies illustrating that the broad lamellipodial phenotype of cells plated on PLL was accompanied by low frequency membrane ruffling and instead a rapid, continuous rearward flow of continuously assembling actin filament networks, partly also directly shown with actin networks labeled with both LifeAct and Arp2/3 complex subunits (see e.g. PMIDs 18800171 and 22500749). In Alexandrova et al, 2008 (PMID 18800171), authors showed that the formation of adhesions in spreading cells triggers the transition from fast to slow flow (which is of course relevant to the current study and conclusions), whereas Ryan et al, 2012 (PMID 22500749) already established the broad incorporation of actin and Arp2/3 complex into the very broad lamellipodia formed on PLL by Xenopus fibroblasts and the rapid flow of both components from distal to proximal lamellipodial regions. None of these seminal studies has been cited, although they are highly relevant for the interpretation and conclusions of the results presented. I would strongly recommend specifically referring to these studies, as this will actually support the conclusions and interpretations drawn.

      (3) On the subject of literature, on the second page of the intro, end of 2nd paragraph, the authors describe Rac signaling to Arp2/3 complex through WRC considered essential for Arp2/3-mediated actin assembly at lamellipodial leading edges, but aside from one of their own papers cite none of the seminal studies by Insall, Scita, Stradal, Rottner, Bogdan labs having published seminal aspects on this pathway.

      Considering the rapid F-actin flow in lamellipodia, obviously accompanied by admittedly sparse but continuous Arp2/3 complex incorporation, it is not so surprising that the latter will be obligatory here, and also the accumulation of its prominent activator WRC, as well as the branch stabilizer cortactin. Thus, the data described on page 3 of the Results section could also be framed in the context of all this previously published knowledge, providing a more comprehensive and realistic view of the relevance and novelty of the described data.

      (4) In the abstract, the authors state in the context of the force-feedback mechanism established in vitro for the formation of Arp2/3 complex-dependent actin networks that "this phenomenon has not been explored through the examination of real-time responses of endogenous actin networks in cells". In my view, this is not correct, as in their prominent Cell paper, the Sixt laboratory has done exactly that (Mueller et al, 2017, PMID: 18800171). Although Mueller et al have not looked at Arp2/3 complex dynamics as far as I recall, they have still connected the extent and hence intensity of actin networks at the leading edges of keratocyte lamellipodia with the forces exerted onto them, including direct experimental manipulation of those forces. Although the study has been cited in an independent context, this point should be made clear, and the corresponding sentence in the abstract should be amended.

      (5) One point that struck me a little bit was the authors' detailed description of cell spreading on PLL and the quite strong variability of Arp2/3 incorporation dependent on the timing after spreading (as for instance the very strong and quite narrow Arp2/3 leading edge intensity at 2 hours post-seeding in Figure 3S2D). In the authors' view, they have worked with a very clean system, as they emphasized to even have eliminated the FN-locus in their cells, excluding the secretion of endogenous FN (PMID: 34861242), but how about ECM components potentially present in serum, such as, for instance, vitronectin? Indeed, it looks like the authors have done all experiments in the presence of 10% serum as far as I can see, although most of the classical PLL-experiments mentioned above have been performed with starved cells in the absence of serum. I think it would generate a more complete picture of the phenotypes and results as compared to the literature if the authors performed a subset of the key experiments on PLL without serum. I don't think the starving of cells as such is important and could be counteracted by simply lamellipodia-inducing growth factors adding into the spreading medium, traditionally perhaps PDGF or EGF (dependent on the receptor distribution of those fibroblasts), but the absence of serum would have two advantages: it would not only exclude any potential impact of serum-containing ECM components, but also alleviate the hyperstimulation of the Rho-pathway through LPA-bound BSA, the major serum-protein, which has previously been shown to counteract the "undisturbed" formation of PLL-type lamellipodia (see Figure 1B in Flinn & Ridley, PMID: 8743960).

      (6) Regarding the scanning EM-images shown in the Supplement, currently called Figure 3S2A and -B (in the text erroneously termed Figures 3S1A and-B, see above). I am not sure how representative these individual EM-images of the cell plated on PLL are, given the data of rapid rearward flow of actin and Arp2/3 complex subunits, at least at early stages of spreading. Again, the classical literature on PLL-type lamellipodia and, in particular, previously published movies of such lamellipodia suggest broad lamellipodia with few ruffles, and the opposite with cells plated on FN. So in this context, the scanning EM-data shown on both PLL and FN do neither fit the authors' own data very well nor the literature, and I would recommend making sure that the individual cells selected were (i) correctly annotated and (ii) representative of a specific time point of spreading actually fitting the previously described data.

      (7) It also surprised me to see that the authors describe the spreading process on PLL to actually be much slower than on FN (see Figure 3S2C - in the text Figure 3S1C). It is tempting to speculate that this might change if plating the cells in serum-free medium, as traditionally, full spreading and lamellipodia formation downstream of PDGF-stimulation (at least in 3T3 fibroblasts) is described to occur in the range of 10-30 minutes at maximum, and not several hours as shown here. This point could also be considered, or at least discussed.

      (8) The movies are of very high quality and beautiful to look at, but it would help the reader to get a bit more information in the legends (like the meaning of the time-stamps, which will display elapsed time in minutes:seconds I assume, but this info is missing from the legends as far as I can see. Also, it would help the reader to better mark in the movies when a specific treatment kicks in. For instance, in movie 10, the legend states treatment starts at 10:00 (minutes:seconds?), but it would help very much if the authors could paste the term "blebbistatin" directly into the movie, beginning with the frame of treatment start.

    3. Reviewer #2 (Public review):

      The authors work with endogenously labeled Arp2/3 complexes in mouse fibroblast cell lines plated on surfaces coated with fibronectin or poly-L-lysine. They observe increased retrograde flow, but decreased actin and Arp2/3 densities, in the absence of integrin-based adhesions. Interestingly, they further find that an increase in branching density can be achieved in the absence of adhesion by a diverse set of perturbations, including blebbistatin, physical compression under agarose, and methylcellulose-mediated increases in extracellular viscosity. Although all of these conditions are likely to have pleiotropic effects on cell physiology and signaling, one plausible common denominator is that they promote cell spreading and may thereby increase membrane tension.

      This study addresses a question of broad interest. The relationship between protrusive actin assembly, resisting forces, and membrane tension has received considerable attention in recent years (for a recent overview, see PMID: 38991476). Earlier work established that branched actin networks can respond to force by increasing network density in vitro (PMID: 26771487; PMID: 35748355), and pioneering work from the Sixt laboratory showed that keratocyte lamellipodia adapt to resisting forces by increasing actin density in cells (PMID: 28867286). Against that background, the manuscript contains novel and insightful observations. At the same time, the current version would be strengthened by a more rigorous mechanistic analysis and by clearer reporting of experimental systems and statistics.

      Major points:

      (1) Engagement with prior work on membrane tension and protrusion.

      The relationship between protrusive actin assembly and membrane tension is a subject of major current interest (PMID: 38991476), and it is unfortunate that the authors do not engage more fully with seminal prior work on this subject. In particular, work from the Weiner laboratory showed that membrane tension can act as an inhibitor of cell protrusion and branched actin assembly, at least in some cell types (PMID: 22265410; PMID: 37311454). In addition, a membrane-tension-sensitive signaling pathway involving PLD2 and mTORC2 has been proposed to mediate this negative feedback (PMID: 27280401). These findings appear, at least at first glance, to contrast with the model advanced here, in which elevated membrane tension is associated with increased branching density. A more explicit discussion of these findings and of the apparent differences between systems would be essential. Testing the relevance of some of the proposed negative-feedback regulators, for example, mTORC2 or PLD2, under at least some conditions expected to increase membrane tension would substantially strengthen the manuscript.

      (2) The central assumption regarding membrane tension should be tested directly.

      Part of the model put forward by the authors rests on the assumption that most of the perturbations used to promote cell spreading, with the exception of hyperosmotic treatment, also increase membrane tension. This is a testable hypothesis. Multiple mechanical and optical methods have been established for this purpose, including tether pulling, micropipette aspiration, and fluorescent membrane-tension probes. Directly measuring membrane tension under at least a subset of the key perturbations would significantly strengthen the manuscript.

      (3) WAVE and cortactin localization should be quantified.

      The claim that WAVE and cortactin localization are independent of fibronectin-integrin engagement (Figure 2A-B) deserves to be established quantitatively. I appreciate that some variability is expected because these experiments use exogenous fluorescently tagged constructs, but the current presentation relies too heavily on representative kymographs. Quantitative analysis would make this conclusion more convincing.

      (4) The interpretation of the increased-viscosity experiments needs stronger physical justification.

      I am aware of the recent high-profile work showing that elevated extracellular viscosity can promote migration (PMID: 36323783), and the present manuscript is clearly supporting this. However, the physical basis for this perturbation is neither well reasoned nor explained clearly enough here. The authors use 0.6% methylcellulose of the 1500 cP grade (the relevant viscosity of the final medium should be stated explicitly btw!). Estimating the added viscosity at 7 cP = 0.007 Pa·s (up from 1 to 8 cP), one can formulate the rough back-of-the-envelope calculation for the added viscous stress:

      delta τ = delta η v/h

      where τ= viscous stress (Pa = pN/µm²), η = viscosity, v= protrusion speed, h = characteristic shear length scale. For cells protruding at 1 um/min, this resistance will be 0.00001-0.001 Pa. Even if the cells would protrude 100 times faster, the resistance would not exceed one pascal! Hence, the added bulk viscous stress opposing protrusion at this viscosity appears negligible relative to the known force-generating capacity of lamellipodia. This does not invalidate the biological phenotype, but it does suggest that the interpretation should be much more careful.

      (5) Cell lines and experimental systems are insufficiently described.

      Most biological experiments in this manuscript appear to have been performed in engineered mouse fibroblast lines, but the Methods do not provide sufficient clarity about which specific cell lines were used in which experiments. More concerning, the manuscript refers inconsistently to the base model as both a mouse dermal fibroblast line and MEFs, while the only clearly distinct named line appears to be JR20 fibroblasts used for traction-force microscopy. Along similar lines, the Arp2/3 knockout cells in Figure 2 are not adequately explained in the Results, Methods, or figure legends, regarding how these cells were generated or how the knockout was validated. The authors only later note in the Discussion that these conditional knockouts were described in an earlier paper. In general, the manuscript would benefit from much more explicit reporting of which cell line or derivative was used in each experiment.

      (6) Some experiments and quantifications appear to suffer from limited replication.

      For example, the optogenetic Rac activation experiment in Figure 2E appears to have been performed possibly only for a single cell per condition, since the raw intensity traces are shown without clear indicators of variability. If that reading is correct, this is below the standard typically expected for mechanistic support and seriously reduces confidence in the strength of this particular conclusion.

      (7) Statistical reporting needs clarification.

      Although the Methods state that the graphs show 95% confidence intervals, the manuscript does not clearly define the underlying statistical unit for many quantified datasets. In several figures, sample sizes are reported as numbers of cells pooled across only two or three independent experiments, but it is not clear whether the authors performed statistical analyses on pooled single-cell measurements or on experiment-level means. The authors should explicitly state for each quantified panel what n represents, what the error bars denote, which statistical test was used, and whether the analyses were performed on per-cell values or on independent experimental replicates.

      (8) The Discussion is rather expansive relative to the amount of experimental evidence presented.

      Parts of the Discussion feel more speculative and interpretive than necessary, and the manuscript would be strengthened by focusing the Discussion more tightly on the principal findings, limitations, and immediate implications of the work.

    4. Reviewer #3 (Public review):

      Summary:

      Butler et al. investigated how different force mechanisms influence Arp2/3-related branched actin networks at the leading edge of lamellipodial protrusions in mouse dermal fibroblasts. In particular, their study aimed at characterizing the specific contribution and interplay between load force and adhesion signaling on the regulation of branched actin networks in live-cell experiments using endogenously one-labeled Arp2/3 subunit. A key finding of their work is that by plating fibroblasts on two different substrates supporting or not integrin engagement, they observe striking differences in branched network architectures that cannot be explained solely by integrin signaling. Instead, several of their results point to mechanical feedback resulting from changes in membrane tension during spreading, regulating the density of branched actin networks. Finally, by modifying the extracellular viscosity, the authors suggest that the stress generated at the actin-membrane interface would play a key role in regulating branched actin density in protrusions.

      Major Strengths:

      (1) The combination of methods used in this paper (endogenous labeling of Arp2/3, Arp2/3 genetic knockout, optogenetic activation of Rac) provides a unique opportunity to monitor spatial and temporal reorganization of endogenous branched networks generated by Arp2/3 in live cells in response to different biochemical and mechanical manipulations.

      (2) The authors provide a deep characterization of the actin-network organization and dynamics observed when plating cells on different substrates, engaging or not integrins (Figure 1 and associated supplementary: intensity and width of the signal in protrusions, retrograde flow, incorporation of actin to the edge, nascent focal adhesions), which serves as a strong basis to build the rest of the paper. They also offer a comprehensive analysis of the different parameters that could explain the lack of dense branched actin network at the leading edge of fibroblasts grown on PLL-coated surfaces (they exclude the contribution of reduced branch nucleation by NPF or insufficient branch stabilization in Figure 2, the insufficient integrin-mediated signaling activating NPF in Figure 2).

      (3) After having ruled out the influence of adhesion signaling in the regulation of branched actin-network density at the leading edge of the cells, the authors demonstrate that the enrichment of Arp2/3 at the leading edge is evolving together with cell spreading, suggesting a possible role of membrane tension in the process (Figure 3 and associated supplementary). To prove their point, they tested numerous methods to promote adhesion-independent cell spreading (Figures 4 to 6), while describing well the limitations of each of these techniques. These methods included promoting rapid spreading on PLL-coated substrate using blebbistatin or physical compression under agarose, and finally increasing extracellular viscosity by treating cells with methylcellulose. All of these treatments led to very consistent results upon the increase in membrane tension, supporting the idea of membrane tension controlling the branched actin organization of cells. This conclusion was further supported by an experiment (Figure 4 S1) in which a hyper-osmotic shock was performed, increasing the actin-membrane interface stress while keeping the spreading area of cells, which led to a drastic increase in Arp2/3 density at the protrusions.

      (4) By activating Rac optogenetically in cells plated on PLL treated with methylcellulose (Figure 8), the authors observe the formation of robust protrusions enriched in Arp2/3, showing that increased extracellular viscosity can bypass the requirement for ECM proteins to activate protrusion driven by signaling.

      Weaknesses:

      (1) Although the lamellipodial architecture in cells plated on PLL appears very different from the one developed by cells grown on fibronectin (Figure 1, wider and less homogenous), the branched network is still present, and one may wonder how these differences can affect the functionality of the lamellipodia (for example, by measuring the impact on migration in 2D and 3D systems).

      (2) To explain the differences observed in the branched actin networks developed by cells on PLL and FN, the authors envision several hypotheses, among which signaling factors or branched-promoting factors would be decreased in the absence of integrin adhesions. They could have, in addition, assessed actin network dynamics and turnover (we could imagine that competition between Arp2/3- and non-Arp2/3- driven structures could be different in the presence or absence of adhesions, the competition being nicely visible from Figure 2B and 2C, where, in the absence of Arp2/3, cells form prominent filopodia).

      (3) All of the methods used to apply physical forces on barbed ends have their own caveats and alter not only membrane tension (but the limitations are discussed in the paper). The paper may have benefited from micropatterning the cells to either reduce or force the spreading of cells in a controlled fashion. In addition, the conclusions on levels of interface stress between plasma-membrane and the barbed-ends of actin lamellipodial networks rely on an estimate of the effect of perturbations rather than on actual measurements of these stress levels.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      Although the finding that branched actin networks respond to the application of physical force by increasing their density was already known from previous in vitro studies, this paper offers a detailed and compelling characterization of the reorganization of endogenously labelled branched actin networks upon different mechanical perturbations. In addition to showing the effect of increased extracellular viscosity on promoting branched actin network densification in the absence of ECM, this paper sheds new light on the interplay between signaling and mechanics in regulating protrusion and spreading. While the authors show that both signaling and mechanical feedback are important regulators of branched actin regulation and cell spreading, they demonstrate that optogenetic Rac activation is not sufficient to trigger branch network formation in the absence of sufficient mechanical support. They thus propose that biochemical signaling would act at a different level than mechanics by promoting protrusion persistence and coherence. This work will therefore impact the field of cell biology in offering a new perspective to understand the interplay between mechanical and biochemical feedback in 2D and 3D migration. It may also have broader implications as the formation of branched actin networks under the regulation by mechanical loads has been shown to be involved in other processes such as endocytosis.

    1. eLife Assessment

      In this study, Yuan and colleagues perform transcriptomic and epigenomic experiments to study open chromatin regions and transcripts that change upon larval settlement in the sponge Amphimedon. The authors present compelling evidence to show that sponge larvae prepare for receiving an environmental cue (sunset) by extensively modifying their chromatin accessibility in the vicinity of genes that are going to be regulated during metamorphosis. The study represents a fundamental advance in understanding the fine genetic control of larval settlement and has significance beyond the immediate field of sponge larval biology.

    2. Reviewer #1 (Public review):

      Summary:

      Yuan and colleagues present a thorough study of gene activation before and during metamorphosis in sponge larvae, combining in-depth analyses of staged transcriptomes and chromatin accessibility profiling (ATACseq). Amongst several very interesting findings, the study reveals that the acquisition of settlement competence, which arises in response to decreasing light at sunset, is characterized by changes in chromatin accessibility that anticipate strong transcriptional shifts occurring as metamorphosis starts. Another notable finding is a set of transcription factors amongst the genes strongly up-regulated at the onset of metamorphosis. In addition, larvae exposed to constant light, a condition that stalls metamorphosis, were found to activate metabolic pathways that are not normally expressed in swimming larvae. Together, the findings provide a rare level of understanding into how environmental conditions can promote deployment of alternative developmental programs in planktonic larvae.

      Strengths:

      This is a very comprehensive, well-documented and rigorous study of a phenomenon of wide interest. It will inspire researchers working on other species to look for similar, environmentally-driven "anticipatory" epigenetic mechanisms. It also provides a wealth of detailed information on genes, notably transcription factors, that are candidates for involvement in regulating specific metamorphosis transitions - and beyond. The data presented here are thus undoubtedly a rich and valuable resource.

      Weaknesses:

      I see no significant weaknesses; however, the documentation of the data is very compressed, with all the findings contained in 4 multi-panel figures with succinct legends. It is not always straightforward to connect the conclusion statements in the text to the figures. Although the relevant data is available in supplementary files, I would appreciate more help in navigating the data to assess the support for key conclusions, if possible, illustrating each text conclusion explicitly in the main figures.

    3. Reviewer #2 (Public review):

      Summary:

      It is demonstrated that sponge larvae prepare for receiving the environmental cue (sunset) by extensively modifying their chromatin accessibility in the vicinity of genes that are going to be regulated during metamorphosis, in the absence of large gene expression changes. This program can be offset by modifying the cue (making light constant), leading to a novel molecular state.

      Strengths:

      This is a top-notch study of a key lifecycle transition in an organism of great phylogenetic importance, involving concurrent gene expression and chromatic accessibility profiling (to the best of my knowledge, this has never been done in non-bilaterians and likely anywhere outside Vertebrata). The result is highly non-trivial. There is also an additional experiment modifying the key environmental cue (constant light), adding additional insight.

      Weaknesses:

      I have only a couple of suggestions.

      (1) Not all new pre-emptively opened OCR regions are associated with genes that are going to be regulated during metamorphosis. Is their association with such genes statistically significant? (Fisher's exact test?)

      (2) Re: extended discussion on possible reasons for activation of specific transcription factor families. I feel it is not terribly useful since it is hardly more than guesswork. The authors should consider condensing this part to better emphasize the major (and most unexpected) large-scale regulation patterns.

      (3) Re: enrichment analysis based on significant genes (Figure 1H): Even though it is a common practice, there is nuance: as we all know very well, many genes pass a significance threshold not because they are highly differentially regulated (i.e., show large fold-change), but because they are more abundantly expressed overall and so the statistical power for them is greater. A good example is ribosomes - before we realized what was happening, they would show up as enriched in almost every experiment of ours, which was not very useful since their fold-change was quite trivial. I see the authors have ribosome enrichment too, and I suspect there are a few more functional groups that made it because they tend to express highly on average. Ideally, we want to see what is enriched among highly regulated genes, not among abundantly expressed genes. Because of this we moved to compute enrichment based only on fold-change, using the GO_MWU package (https://github.com/z0on/GO_MWU). I suggest authors give it a shot, to see if the enrichment results become more interpretable. GO_MWU is also very powerful to analyze enrichment in WGCNA modules, in case the authors want to try that.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Huifang Yan and colleagues perform RNA-seq (CEL-seq) and ATAC-seq experiments to profile the transcriptome and chromatin accessibility of sponge larvae across larval competence, settlement and early postlarval development. Amphimedon, the sponge species that they use, is amenable to lab experiments and can therefore be a convenient model for experimenting with this otherwise difficult to assay ecological parameters and cues. They had previously observed that light conditions (diminished light) at sunset are critical for larvae to enter a pre-settlement stage and prime them for settlement and metamorphosis. In this paper, they report that these conditions induce a gain of accessibility in many genes, including transcription factors, and that altering these conditions by providing continuous light at sunset affects this reprogramming event.

      Strengths:

      The above is a very interesting observation, one that the authors speculate could have a broader significance and be a theme in many more larvae. I agree with the authors that this is an important finding, and I think that the paper will be interesting for a broad readership. If this is the case, the authors open up a new theme of chromatin regulation, extensively studied in mammalian contexts, but severely understudied in pretty much every other context.

      Weaknesses:

      I think, however, that their paper often reports the data in a difficult-to-follow way, and that other sorts of analyses would have made the results more accessible for a broad readership. Here, I present some suggestions that the authors might want to take into account to improve their results.

    1. eLife Assessment

      This large-scale comparative study of odorant receptor (OR) genes across more than 100 insect species, combining sequence- and structure-based approaches, aims to explore the evolution of this large gene family involved in the detection of odorant signals by olfactory neurons. This useful work uncovers a structural feature unique to the odorant receptor co-receptor Orco that reduces ligand binding affinity. However, the strength of evidence is incomplete: the pipeline for in silico identification of odorant receptor genes lacks validation through comparison with known odorant receptor repertoires from previously studied species, and claims regarding odor response spectra, evolutionary, and ecological interpretations are not fully supported by the analyses.

    2. Reviewer #1 (Public review):

      Objectives of the study and impact of the work:

      The authors of this article primarily aim to reconstruct the evolutionary history of the insect odorant receptor (OR) family, which is responsible for the detection of odorant signals by olfactory neurons. Due to the lack of phylogenetic signal present in the sequences of this multigene family, which evolves very rapidly, phylogenetic analyses have so far never made it possible to precisely retrace how ORs diversified prior to the appearance of present-day insect orders, and what the drivers of this diversification were. For example, one may suspect that the adaptation of ORs to odors emitted by plants constituted a critical step in insect evolution during the "angiosperm terrestrial revolution," which occurred at the end of the Cretaceous, but nothing currently allows this to be asserted.

      There are very nice examples, notably in Drosophilids, derived from comparisons between closely related species and documenting mechanisms of OR adaptation to certain signals. However, what the authors attempt to do in this work is to produce a macroevolutionary analysis at the scale of insects as a whole, based almost exclusively on bioinformatic analyses. To do this, they annotated OR genes in about one hundred insect species and developed pipelines for analyzing sequence similarity, structural similarity, and functional similarity, the latter being estimated through a molecular docking approach. An important feature in the evolution of insect ORs is the emergence of a unique co-receptor, called Orco, which appears to be an OR that has lost the ability to bind odorants. In addition to the large-scale bioinformatic analysis, the authors also aim to explore more specifically the factors that favored the emergence of Orco and the selective advantage conferred by the existence of OR-Orco complexes.

      Given the importance of odorant receptors in insect biology and in their adaptation to different environments and lifestyles, retracing their evolutionary history is indeed a major question in evolutionary biology. In principle, this type of work therefore has the potential to become a reference in the field and to provide a basis for significant scientific advances.

      Major strengths and weaknesses:

      The sampling chosen for collecting OR sequences is very impressive, with more than 100 insect families represented, covering most of the major orders. This sampling appears appropriate for the question being addressed. The analysis pipeline used to collect the sequences makes sense, relying on homology-based annotation tools coupled with a structure-based filter. Nevertheless, one can note aberrant numbers of ORs for certain species (much lower than reality), which indicates that the pipeline probably did not function correctly for all genomes. In the absence of a validation step comparing the results with already known OR repertoires, it is difficult to estimate the overall quality of the data. The authors chose to apply a fairly stringent filter on sequence quality (based on predicted 3D structure), which reduces the number from 14,000 to 9,000. This choice seems logical given the subsequent use of these data, but it inevitably leads to data loss. The fact that some OR genes may be missing and that the total number may not be exact for each species is not prohibitive for studying the evolution of the family at a broad scale; however, it calls into question certain results that rely on this total number, such as the correlation between the number of ORs and genome size, lifestyle, and diet.

      From the dataset collected, the authors attempted to categorize ORs in several ways, starting with the reconstruction of sequence similarity networks. The approach is interesting, but in the end, the results do not seem to be sufficiently exploited, and it is not obvious what the advantage of this approach is compared with the "classical" phylogenetic approach, which generally fails to reveal homology relationships between ORs from species belonging to different insect orders. Here again, the majority of the clusters identified are "order-specific," and when this is not the case, the authors did not attempt to exploit the results. For example, clusters SeqC26 or SeqC28, which appear to be shared by many insects, are potentially very interesting. It might have been relevant to combine this similarity-based clustering approach with phylogenetic reconstructions within each shared cluster.

      The clustering based on structure also leads to the identification of a majority of "order-specific" clusters, but once again, the clusters shared by several orders are not truly exploited, which does not provide new insight into the evolution of ORs. However, the authors highlight a group of ORs in flies that appear to possess an unusual intracellular region. This is interesting, although it is a result more relevant to OR structure than to their evolution. The function of these ORs in Drosophila melanogaster, if it is known, is not discussed.

      The analysis of structural diversity then leads the authors to focus on the Orco co-receptors, which are characterized by modifications of the binding pocket and the emergence of an extracellular loop that could explain the loss of the ability to bind odorant molecules. This part, which relies on in vitro experiments, is interesting and constitutes the most striking result of the study, which could in itself have been the subject of a separate manuscript. However, the molecular dynamics modelling does not add anything in the way it is conducted (5 ns is too short).

      The rest of the manuscript is based on the prediction of OR response spectra using molecular docking. The work that has been carried out is extremely substantial, and the objective of linking clusters based on sequence similarity or 3D structural similarity with functional categories is entirely relevant. Nevertheless, I see two major problems with this in silico functional analysis:

      (1) The docking score threshold used was chosen thoughtfully, which is very good, and according to the calculation performed, should ensure a true positive rate of more than 20%, which is excellent in such a docking analysis. But in the absence of functional validation, this 20% true positive rate is not sufficient to extrapolate OR function as the authors do in the remainder of the manuscript. The risk of error remains too high to compare in such detail the function of ORs from insects with different lifestyles or diets.

      (2) The six functional clusters identified are only slightly different from one another, with similar detection of all chemical families except acids and amines (which was expected, given that these families are a priori detected by IRs rather than ORs). This shows that even though the approach is relevant and deserves to be tested, it cannot be used to establish a link between groups/lineages of ORs and response spectra at the scale of insects as a whole. This is reflected in the final analysis by the fact that there is no visible link between sequence or structural clusters and functional clusters. Given the uncertainty surrounding the docking results, the entire subsequent analysis of the relationship between the Binding Breadth Index and ecological variables is highly questionable.

      Finally, the evolutionary analysis proposed to conclude that the work suffers from an incorrect interpretation: ORs of non-holometabolous insects cannot be considered equivalent to those of species that existed before the Permian-Triassic extinction. The fact that a locust or a cockroach has more narrowly tuned ORs than holometabolous insects does not mean that this was also the case for ancestral insects. To advance this type of conclusion, it would be necessary to conduct a phylogenetic analysis and reconstruct ancestral states, which is not the case here.

      In summary, despite the large number of analyses performed, the authors do not succeed in achieving the stated objective of reconstructing the evolutionary history of insect ORs, and the results obtained do not sufficiently support the conclusions regarding the links between OR repertoires and environment or lifestyle.

    3. Reviewer #2 (Public review):

      The remarkable evolvability of the olfactory system enables animals to rapidly adapt to dynamic and chemically complex environments. Over the past two decades, substantial effort has been devoted to uncovering the evolutionary principles that drive the diversification of odorant receptors (ORs), yielding key insights into the forces shaping their striking variability in both vertebrates and insects. In this manuscript, Zhang and colleagues analyze the OR repertoires of over 100 insect species, leveraging sequence and structural similarity to infer patterns of gene family evolution within this diverse and ecologically important clade. By integrating sequence-based and structure-based comparisons, their study builds on a compelling and recently emerging line of research made possible by the advent of AlphaFold, which has previously clarified the phylogenetic relationship between insect Ors and the gustatory receptor gene family and revealed the unexpectedly deep evolutionary origins of this ancient structural fold.

      Applying this approach to a large set of ORs derived from species throughout the insect phylogeny, the authors confirm many previously reported patterns of OR evolution. Unfortunately, the way these results are presented lacks clarity in what is already known from previous work in the field versus what is a novel finding based on the analysis of this dataset.

      It is unclear how complete the odorant receptor sets are. I recommend benchmarking the pipeline by comparing its output to a gold standard and a frequently vetted complete OR set, such as that of Robertson and Wanner 2006 or similar.

      Using their structural clustering approach, the authors identify a structural feature mostly unique to the OR co-receptor ORco, a beta-sheet in EL2, which they functionally show reduces odorant binding affinity - a key aspect of ORco, which does not bind ligands in the ancestral ligand-binding site. This is a particularly strong part of the manuscript, since the authors support their in silico-derived hypothesis with functional data.

      Lastly, in an attempt to assess the relationship between sequence identity and structure on one hand and function on the other, the authors perform an in silico structure prediction and chemical docking analysis. As it stands, this part is on the more speculative side since the docking approach has not been verified with available functional datasets.

    1. eLife Assessment

      The study presents useful findings on the behavioral effects of nicotine exposure, suggesting the Drosophila larva as a potential model organism for studying underlying neural circuits. However, the evidence supporting the claims of the authors is incomplete and would benefit from more rigorous analysis and explanations. The study falls short of identifying the neural mechanisms and is therefore of interest to those with an interest in pharmacology and behavior.

    2. Reviewer #1 (Public review):

      Summary:

      Dancausse et al. investigate behavioral responses to nicotine exposure in Drosophila larvae. They discover that high concentrations of nicotine lead to less movement and twitching, which recover slowly after several hours. Exposure to lower concentrations, however, increases locomotion and leads to hyperactive behavior. The authors also perform pharmacological and genetic manipulations to address the role of dopamine for these behavioral changes. Additionally, they test the role of MB intrinsic neurons by genetic silencing. Both Dopamine and MB manipulations affect responses to nicotine exposure. Finally, they investigate how larvae respond to repeated exposures to nicotine and find that they do not habituate. Additionally, repeated exposure to nicotine leads to a preference towards higher concentrations in a gradient assay.

      Strengths:

      The authors use rigorous behavioral analysis and discover interesting concentration and experience-dependent effects of nicotine exposure on locomotion in fly larvae, which will be worth investigating in the future to decipher the underlying neural mechanism.

      Weaknesses:

      As the manuscript currently stands, the results of genetic manipulations are hard to interpret and rather inconclusive. The genetic manipulations have been performed using broadly expressing genetic driver lines, which weakens the conclusions drawn by the authors. Thus, no specific neural populations or brain regions have been discovered, and there is little insight into the underlying neural mechanism.

      Based on gradient experiments, the authors suggest that fly larvae could serve as a model organism for addiction. This claim is quite strong, but no control experiments are shown for shorter exposure or a single exposure with a longer resting period before the gradient test. To compare this to addiction-like behaviors, more control experiments should be performed.

      The authors should clarify better how experiments were performed in Materials and Methods. Generally, the authors perform novel behavioral analysis, which is not explained in enough detail. The nicotine concentration that has been used for most experiments is this a relevant concentration comparable to other studies? This information would be useful to put into context with other findings.

    3. Reviewer #2 (Public review):

      Summary:

      CNS function relies on a balance of excitatory and inhibitory activity. Use of addictive stimulants such as nicotine results in a chronic imbalance of these activities, and often this activity acts through dopamine pathways. To address how stimulants cause dysfunctional signaling in the DA neurotransmitter system and how this impacts neural circuit activity and behavior, the authors of this study begin to establish Drosophila larvae as a model for studying nicotine exposure.

      They focus on three questions:<br /> (1) In what ways does nicotine-driven hyperactivation modulate behavior?<br /> (2) What roles do neural circuits play in these responses?<br /> (3) What are the mechanisms of drug dependence and addiction-like plasticity?

      To this end, the authors use high-resolution behavioral, genetic, and pharmacological methods.

      The authors show that exposure to nicotine alters the behavioral repertoire of larval Drosophila, with effects that are long-lasting (hours) and dose-dependent. Most of the study uses a 5-minute exposure to "moderate" levels of nicotine because this dosage produces the greatest potentiation of larval crawling speed. Concomitant with increases in crawling speed, they find alterations in other behavioral parameters-crawl "efficiency" and turn rate are reduced; whereas head swings are faster and more likely to be accepted. They find that reducing the activity of dopaminergic neurons reverses the valence of behavioral change upon exposure to nicotine. For example, crawling speed is decreased upon nicotine exposure in a Ple>Kir2.1 manipulation in comparison to controls. Moreover, they demonstrate that the effect of nicotine on the quantified set of behaviors depends on dopamine signaling. Beyond implicating dopamine signaling, they implicate the mushroom body, and particularly the gamma-neurons, in mediating exposure to nicotine.

      The authors further probe how nicotine exposure alters larval behavior. First, they determine what happens to crawling speed with multiple exposures, finding sustained higher crawling speeds relative to controls. Second, as a model for addition-like behavior, they examine larval behavior on a nicotine gradient after repeated nicotine exposure. The data in Figure 7D are particularly compelling, showing that after nicotine exposure, larvae prefer high concentrations of nicotine.

      Strengths:

      In a concise set of experiments, the authors demonstrate a nicotine-induced behavioral change, its interaction with a neurotransmitter system, and a locus of action within the CNS. Thus, the authors set the stage for the use of Drosophila larvae as a model to better understand addiction-related behaviors.

      Weaknesses:

      This is a clear advance for the field of larval neurogenetics, but the extent to which it changes the way we think about nicotine exposure more generally is less clear. Nonetheless, the authors clearly achieved the goal they set out to attain.

    4. Reviewer #3 (Public review):

      Summary:

      Dancausse et al. examine behavioral responses to nicotine administration in larvae. The study first distinguishes between spasms and extreme hyperexcitability elicited at high doses from a hyperactivity state triggered at lower (~1 mM feeding) doses. They then focus on the hyperactivity state and examine if dopaminergic neuron function is involved (via transgenic and pharmacological manipulations). Next, the role of the Mushroom body, a site of integration in the larval brain, is interrogated. In these studies, the authors use multiple approaches to draw complementary conclusions. The last section examines the effect of repeated nicotine exposure and of nicotine preference following repeated exposure. The findings are foundational for future studies looking to use Drosophila larvae as a system to study nicotine addiction.

      Strengths:

      Overall, I think the study is of broad importance. The neurogenetics community gets valuable insight into how ACh excitation interplays with DA signaling to regulate movement. For the addiction community, the work describes a valuable system to further interrogate genetic and environmental factors potentially driving addiction under well-controlled conditions. The quantitative analysis is generally well done, and the use of multiple experimental strategies to buttress conclusions is commendable.

      Weaknesses:

      (1) Conceptual point. Insects use ACh as the primary excitatory neurotransmitter, with nAChRs broadly expressed, while vertebrates use Glutamate in this role. (Arguably, nicotine expression in tobacco plants evolved as an insecticide, broadly disrupting the central excitatory neurotransmitter.) In vertebrates, central ACh neurons are relatively sparse - primarily originating from the basal forebrain.

      Based on these distinctions, it is important to consider/contrast nicotine-driven hyperexcitation from other methods to produce broad hyperexcitation (e.g., inhibition of GABA, high K+, elevated temperature, etc). Many of these methods to induce hyperexcitability would also modulate DA circuitry.

      A discussion of the role of ACh in insect vs. vertebrate brains is necessary to interpret the experimental design and findings with regard to addiction. These points should be addressed in the intro and discussion.

      (2) (Figure 1) Relatedly, how do the behaviors elicited in Figure 1B (30 or 60 mM) compare to the convulsions described following electroshock stimulation to induce a seizure? My suspicion is that you're essentially triggering a seizure (or seizures) in these larvae.

      (3) (Figure 4) Is a statistical analysis between the CS, Ple>Kir, Ple, and Kir locomotion at baseline done? Presumably, these manipulations would alter the intrinsic activity levels of the larvae?

      (4) (General quantitative question) How do the parameters co-vary across individuals following nicotine admin? Crawl speed and peristalsis frequency are analyzed. Turning doesn't seem to be considered. Do individuals that show large increases in velocity also show the largest reductions in turn rate? Are these relations preserved following the DA metabolism and MB function interventions?

      (5) (Discussion / general question) Beyond DA, other monoamines are involved in regulating larval locomotion - OA and TA are a clear example from Fox et al. (2006). Could the authors comment on whether they would expect similar findings in other neurotransmitter systems or if these neurotransmitter systems are involved in the ACh -> DA interplay studied here?

      (6) (Discussion) Following the establishment of nicotine preference, do larvae exhibit signs of 'withdrawal' or changes in baseline behavior when deprived of nicotine? For example, in Figure 6, does the speed following nic administration ever 'go below' the H2O line?

    5. Author response:

      We appreciate the extremely helpful feedback from the reviewers and editors for our manuscript. We are happy that the reviewers have appreciated what we are doing here, performing the initial work that should set the stage with Drosophila larva as a model for hyperactive stimulant response. Every comment is certainly addressable within a reasonably short time period and we look forward to improving our paper in an upcoming revision.

      We have some confusion about the “fundamental issue” of using nicotine, as we see the excitation as the fundamental effect we are studying, but we can continue to discuss and clarify this.

      We plan to make significant edits to our introduction and background sections to better frame the goals of the work, and will clarify and expand on our methods, and more carefully make any claims about neural mechanisms.

    1. eLife Assessment

      This work provides an important modeling-based framework for understanding the processes of temporal integration in the claustrum. These mechanisms could support a broader range of integrative brain function. The manuscript presents solid evidence for how claustrum may integrate temporal disparate signals via a novel computational phenomenon with neural dynamics evolving along neural trajectories as opposed to settling into fixed-point attractor states.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors investigate how the anterior claustrum may integrate temporally separated task-relevant signals to guide behavior in a delayed escape paradigm. Because in vivo neural recordings from claustrum during this task are extremely limited-comprising single-trial data with small neuronal samples-the authors adopt a modeling-driven approach. They train recurrent neural networks (RNNs) using only behavioral data (escape latency) to reproduce task performance and then analyze the internal dynamics of the trained networks. Within these networks, they identify a subset of units whose activity exhibits persistent responses and strong correlations with behavior, which the authors label as "claustrum-like." Using dimensionality reduction, decoding, and information-theoretic analyses, they argue that these units dynamically integrate conditioned stimulus (CS) and door-opening signals via nonlinear, trajectory-based population dynamics rather than fixed-point attractor states.

      To bridge model predictions and biology, the authors complement the modeling with in vitro slice experiments demonstrating recurrent excitatory connectivity and prolonged activity in the anterior claustrum that depends on glutamatergic transmission. They further compare latent neural trajectories derived from previously published in vivo claustrum recordings to those observed in the RNN, reporting qualitative similarities. Based on these results, the authors propose that the claustrum implements temporal signal integration through recurrent excitatory circuitry and dynamic population trajectories, potentially supporting broader theories of integrative brain function.

      Strengths:

      This study addresses an important and challenging problem: how to infer population-level computation in a brain structure for which in vivo data are sparse and experimentally constrained. The authors are commendably transparent about these limitations and seek to overcome them through a principled modeling framework. The integration of behavioral modeling, RNN analysis, and slice electrophysiology is ambitious and technically sophisticated.

      Several aspects stand out as strengths. First, the behavioral RNN is carefully trained and interrogated using a rich set of modern analytical tools, including cross-temporal decoding, trajectory analysis, and partial information decomposition, providing multiple complementary views of network dynamics. Second, the slice experiments convincingly demonstrate recurrent excitatory connectivity in anterior claustrum, lending biological plausibility to the model's reliance on recurrent dynamics. Third, the manuscript is clearly written, logically organized, and conceptually engaging, and it offers a coherent mechanistic hypothesis that could guide future large-scale recording experiments.

      Importantly, the work has significant heuristic value: rather than merely fitting data, it attempts to generate testable computational ideas about claustral function in a regime where direct empirical access is currently limited.

      Weaknesses:

      Despite these strengths, the manuscript suffers from a recurring and substantial conceptual issue: systematic over-interpretation of model-data correspondence. While the modeling results are potentially insightful, the extent to which they are presented as recapitulating real claustral neural mechanisms goes beyond what the available data can support.

      A fundamental limitation is that the RNN is trained solely on behavioral output, without being constrained by neural data at either single-unit or population levels. As a result, the internal network dynamics are underdetermined and non-unique. Many distinct internal solutions could plausibly generate identical behavior. However, the manuscript frequently treats the specific internal solution discovered in the RNN as if it were a close approximation of the actual claustrum circuit.

      This issue is compounded by the sparse nature of the in vivo data used for comparison. The GPFA-based trajectory analyses rely on pseudo-populations and single-trial recordings, yet are interpreted as evidence for robust population-level dynamics. Because neurons were not recorded simultaneously, the inferred trajectories necessarily lack true population covariance and shared trial-to-trial variability, limiting their interpretability as genuine population dynamics. Similarly, conclusions about trajectory-based versus attractor-based computation are drawn almost exclusively from model analyses and then generalized to the biological system.

      Overall, while the modeling framework is appropriate as a hypothesis-generating tool, the manuscript repeatedly crosses the line from proposing plausible mechanisms to asserting explanatory or even causal equivalence between the model and the brain. This undermines the otherwise strong contributions of the work.

      Below are several specific points that warrant further clarification or revision:

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit.

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      This manuscript presents an interesting and potentially valuable modeling-based framework for thinking about temporal integration in the claustrum, supported by solid slice physiology. However, in its current form, it overstates the degree to which the proposed RNN dynamics reflect actual claustral neural mechanisms. With substantial revision-especially a more cautious interpretation of model-data similarity and a clearer articulation of modeling limitations-the study could make a meaningful contribution as a hypothesis-generating work rather than a definitive mechanistic account.

      Comments on revisions:

      The authors have carefully addressed the concerns raised in the initial review. In particular, the manuscript has been substantially improved in terms of tone, conceptual clarity, and the interpretation of the modeling results. The revised version now presents a well-balanced and appropriately framed account of the work.

      The study offers a compelling and useful hypothesis-generating framework for understanding temporal integration in the claustrum, and I support its publication. As a minor point, given the acknowledged limitations of pseudo-population and single-trial data, it would be preferable to slightly soften a few remaining statements that describe trajectory structure as directly "reflecting" population-level dynamics (e.g., using "consistent with" instead).

    3. Reviewer #2 (Public review):

      This manuscript reports the behavior of a computational model of rat claustral neurons during the performance of a behavioral task known as the delayed escape task (in this reviewer's understanding, this behavioral task was created and implemented by this group only). These authors have argued in a prior manuscript (Han et al.) that a group of neurons located "rostral to striatum" are part of the claustrum. The group names the region the "rostral to striatum claustrum." Additionally, in the Han et al. paper, the authors argue that these cells are responsible for maintaining a signal that lasts through the delay period.

      The main findings of the current paper are:

      (1) The authors have built a model network that was trained to show firing similar to what was reported for rats in their prior paper.

      (2) The authors' analysis of model behavior is used to suggest that the model network recapitulates biological activity, including the existence of a cluster of cells mainly responsible for the delay period firing.

      (3) The authors offer evidence from patch clamp recordings for excitatory interconnections among claustral neurons that are an essential feature of the model network.

      A major value of the computational network is that "trials" of the network can be performed. In experiments on animals, only single trials can be used.

      Concerns:

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g. in figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), the equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions intended to "test" the model.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which to this reviewer's knowledge was the first to demonstrate such connectivity, including the long duration events and impact of planes of section.

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustum. Additional papers by Mathur's group and Citri's group are ignored.

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

      Comments on revisions:

      The authors have adequately addressed the concerns that were raised in response to the first version of the manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      We thank the reviewer for their constructive and insightful comments and agree with the importance of the points raised. We recognize that aspects of our original presentation may have been unclear or overly strong in their interpretation. We have therefore revised the manuscript to clarify our intended scope, moderate our claims, and strengthen the analysis. In the second paragraph of the Discussion, we have explicitly acknowledged the concerns raised by the reviewer and outlined how they have been addressed in the revised manuscript. Our detailed responses are provided below.

      (1) Tone of model-data correspondence

      Numerous statements describe the RNN as "closely mimicking," "recapitulating," or being "nearly identical" to claustral neural dynamics, sometimes extending to claims about causal relationships between neural activity and behavior. Given that neural data were not used to train the model, and that only a small subset of trained networks showed the reported dynamics, these statements should be substantially softened throughout the manuscript. The RNN should be framed as providing one possible computational realization consistent with existing data, not as a close instantiation of the biological circuit

      We agree with the reviewer’s comment. The expressions noted by the reviewer (e.g., closely mimicked, nearly identical, recapitulate) will be replaced with alternative wording that conveys a more moderate meaning (Line 16-17, 65-66, 83, 96, 120, 212).

      (2) Non-uniqueness of RNN solutions

      The fact that only a small fraction of trained networks exhibited "claustrum-like" clusters deserves deeper discussion. This observation raises the possibility that the identified solution is fragile or highly specific rather than canonical. The authors should explicitly discuss the non-uniqueness of internal solutions in behavior-trained RNNs, including the range of alternative network dynamics that can reproduce the same behavior. In particular, it should be clarified why the specific network exhibiting "claustrum-like" clusters is informative about claustral computation, rather than representing one arbitrary solution among many.

      As the reviewer pointed out, behaviorally trained RNNs can admit multiple internal solutions that produce the same behavioral output, and we acknowledge the non-uniqueness of such internal solutions. However, we do not interpret the fact that only a subset of trained RNNs exhibit dynamics similar to those observed in the claustrum as evidence that this solution is fragile. Notably, the claustrum-like dynamics emerged spontaneously during training and were not explicitly enforced. Furthermore, our finding suggests that the emergence of this particular dynamical regime depends on relatively specific structural constraints.

      Our criterion for selecting RNNs that could inform the computational principles of the claustrum was their ability to reproduce the behavioral and physiological observations obtained in the delayed escape experiments. RNNs that were excluded may reflect information-processing strategies used by other brain regions or may rely on artificial logical structures. The computational demand of the task, which integrates temporally separated signals, naturally drives convergence toward networks with recurrent excitatory connectivity capable of maintaining persistent activity. Indeed, all networks that exhibited a claustrum-like cluster shared a common structural feature: strong recurrent excitatory connectivity within Cluster 1. This property is consistent with biological characteristics observed in the slice experiments shown in Fig 2.

      Importantly, the computational principles derived from this RNN were found to be quantitatively consistent with in vivo single-neuron activity patterns. Specifically, analysis using an eigenvalue-based metric (λ<sub>3</sub>/Σλ) revealed the same directional effect in both the RNN and the claustrum neuron data. In addition, a leave-one-neuron-out analysis showed that this pattern was broadly distributed across in vivo claustral neurons rather than being driven by a small subset (see Fig. 4).

      Taken together, these convergent lines of evidence suggest that the computational model is not simply one arbitrary solution among many possible alternatives, but rather implements a computational principle that may underlie claustral functions.

      (3) GPFA trajectory comparisons

      The qualitative similarity between RNN trajectories and GPFA-derived trajectories from sparse in vivo data is interesting but insufficient to support claims of robustness or population-level structure. Statements suggesting that these patterns are unlikely to arise from noise or random fluctuations are not justified, given the single-trial, pseudo-population nature of the data. Either additional quantitative controls should be added, or the interpretation should be substantially tempered.

      As the reviewer pointed out, the GPFA trajectory comparison presented in the original manuscript remained largely qualitative, and we agree that this alone was insufficient to establish robustness or provide convincing evidence for population-level structure. In the revised manuscript, we have therefore added the requested quantitative analysis (see Fig. 4).

      Before describing the analysis, we would like to clarify several methodological limitations associated with pseudopopulation and single-trial data. GPFA estimates latent trajectories based on assumptions about covariance structure among neurons and temporal smoothness. In pseudopopulation datasets, the true simultaneously recorded covariance structure cannot be fully reconstructed, which is an inherent limitation. Because our dataset is based on single trials, the analysis does not directly exploit trial-to-trial variability. Nevertheless, the estimation of the latent space still depends on the covariance structure among real claustral neurons, suggesting that the inferred trajectories remain tied to biologically meaningful population dynamics.

      Accordingly, the quantitative metric we introduce is not entirely independent of the GPFA estimation step. Rather, it is intended to evaluate the geometric structure of the single-trial latent trajectories estimated by GPFA. We acknowledged this limitation in the revised manuscript.

      Specifically, for the biological data, we reanalyzed the GPFA-derived latent trajectories in PCA space and computed an eigenvalue-based metric (λ<sub>3</sub>/Σλ). For each of the 20 time bins, we applied a sliding window of 10 bins and calculated the covariance matrix within that window. The eigenvalues of PC1, PC2, and PC3 were then obtained, and the third eigenvalue (λ<sub>3</sub>) was normalized by the total variance (Σλ = λ<sub>1</sub> + λ<sub>2</sub> + λ<sub>3</sub>). This metric quantifies the degree to which the trajectory locally deviates from a planar structure that can be explained by two dominant axes. An increase in λ<sub>3</sub>/Σλ indicates that the population-state trajectory forms a higher-dimensional geometric structure beyond a simple two-dimensional combination.

      For the RNN data, in contrast, the activity of all units can be observed simultaneously and sufficient trial repetitions are available. Therefore, GPFA was not applied; instead, PCA was performed directly on the population activity for each trial. We then computed an average trajectory across trials and applied the same λ<sub>3</sub>/Σλ metric. Thus, although the initial dimensionality reduction steps differ between the two systems, the definition and calculation of the final quantitative metric are identical. The focus of the comparison is therefore not the dimensionality reduction technique itself, but the geometric dimensional structure of the population trajectories evolving over time.

      Importantly, within the biological dataset, the GPFA estimation procedure, preprocessing steps, pseudopopulation construction, subsampling strategy, temporal alignment criteria, and smoothing parameters were applied identically across conditions. Likewise, the same analysis pipeline was used for all conditions in the RNN. If structural biases had been introduced during covariance estimation or dimensionality reduction, they would be expected to affect all conditions within each system similarly. Nevertheless, the λ<sub>3</sub>/Σλ value was consistently and significantly higher in the CS condition than in the Neutral condition, and this directional pattern was observed in both the RNN and the claustral neuron data. This suggests that the effect reflects condition-specific differences in population dynamical structure rather than artifacts arising from a particular dimensionality reduction method.

      To further test whether the observed effect might be driven by a small subset of neurons or specific neuron combinations, we performed a leave-one-neuron-out analysis on the claustrum dataset. Recomputing λ<sub>3</sub>/Σλ while removing one neuron at a time showed that, in the CS group, most neurons contributed relatively evenly to this metric, whereas the Neutral group did not show such a distributed contribution pattern. This indicates that the observed three-dimensional structure is not driven by a few outlier neurons or incidental covariance patterns, but rather reflects an organized population-level phenomenon.

      If the result were primarily due to structural artifacts introduced by the pseudopopulation construction or dimensionality reduction procedures, it would be unlikely for consistent selective differences to repeatedly emerge between conditions under identical analysis pipelines. The consistently higher λ<sub>3</sub>/Σλ values observed in the CS condition therefore provide indirect support that this pattern reflects condition-specific population dynamics rather than estimation bias.

      Taken together, these results suggest that the observed three-dimensional structure reflects condition-specific population dynamics rather than analysis artifacts. The fact that the same quantitative metric yields consistent effects in both the RNN and claustral data further strengthens the correspondence between the two systems.

      (4) Scope of functional claims

      The discussion connecting the findings to broad theories of claustral function, global workspace, or consciousness extends well beyond the data presented. These speculative links should be clearly labeled as such and significantly reduced in strength and prominence.

      We agree with the reviewer and stated that references to these theories are speculative, while substantially reducing both their emphasis and prominence in the manuscript (Line 444-446, 451).

      (5) Comment on Conceptual Interpretation of the Behavioral Paradigm:

      The manuscript repeatedly describes the delayed escape task as an "inference-based behavioral paradigm" and states that animals "infer that a value-neutral alternative space is likely to be safer" when the CS is presented in a novel environment. While I appreciate that the US-CS association was established in a different context and that the CS is then presented in a new environment, I am not convinced that the current behavioral evidence uniquely supports an inference interpretation.

      First, it is not clear that this task is widely recognized in the literature as a canonical inference task, in the sense of, for example, sensory preconditioning, transitive inference, or model-based inference paradigms. Rather, the observed effect-that CS animals escape faster to a neutral compartment than neutral-CS controls-can be parsimoniously interpreted in terms of generalized threat value, heightened fear/anxiety, or a bias toward avoidance/escape under elevated threat, without requiring an explicit inferential step about the specific safety of the alternative compartment. The fact that no prior training is needed is compatible with flexible generalization, but does not by itself demonstrate inference in a more formal computational sense.

      Second, the inference claim becomes central to the manuscript's conceptual framing (e.g., the idea that rsCla supports "inference-based escape"), yet the behavioral analyses presented here and in the cited prior work do not clearly rule out simpler accounts. Clarifying this distinction would help avoid overstating both the inferential nature of the behavior and the specific role of rsCla and the RNN's "claustrum-like" cluster in supporting inference per se, as opposed to more general integration of threat-related signals with an opportunity for escape.

      We agree with the reviewer’s concern. First, we referred to the delayed escape behavioral task as “a behavioral paradigm that requires integration of temporally separated task-relevant signals.” (Line 7-8). We also removed references to the term inference throughout the manuscript (Line 46, 51, 67, 397).

      Reviewer #2 (Public review):

      We sincerely thank the reviewer for their constructive and insightful comments. Through the revision process, the manuscript has been substantially improved, with increased reproducibility, more appropriate acknowledgment of prior work, and a clearer and more logical presentation of the study.

      (1) This paper is based on behavioral results and neural recordings from their prior paper (Han et al.), but data, e.g., in Figure 1, are not clearly identified as new or as coming from that source. Figure 1A, for example, appears to be taken directly from Han et al. No methods are given in this manuscript for the behavioral testing or the in vivo electrophysiology.

      We agree with the reviewer that this distinction should be made clearer. In the original manuscript, we indicated in the Figure 1 legend that panels A, D, E, F, and L (left) were reproduced from Han et al. (2024). To further clarify this point, we explicitly noted this distinction again in the main text (Line 74, 85). In addition, we described the behavioral experiments and in vivo electrophysiological recordings performed in Han et al. (2024) in the Methods section and include the appropriate citation (Line 463-530).

      (2) Many other details are unclear. Examples include model training, the weight matrices and how these changed with training (p. 13), equations 2 and 3 (p. 13), the sources for the constants in the equations (p. 14), the methods (anesthesia, stereotaxic coordinates, injection specifics and details for "sparse expression") for the ChrimsonR injections.

      We agree with the reviewer’s comment and have revised the manuscript to provide a more detailed description of the model training procedure, weight initialization, and parameter selection.

      We expanded the explanation of the model training procedure and weight initialization. Specifically, the recurrent (W<sub>rec</sub>) and output (W<sub>out</sub>) weight matrices were initialized using a Glorot normal distribution with a standard deviation of to ensure stable signal propagation during early training. In addition, we now explicitly describe the training algorithm and optimization procedure. The network was trained using the Adam optimizer implemented in TensorFlow (v2.1.0) with a batch size of 256 for 1.2 million training iterations, minimizing the per-trial loss function defined in the manuscript. We also explicitly stated how Dale’s principle was maintained throughout training: rows in W_out corresponding to inhibitory units were zeroed out, and recurrent weights were continuously constrained so that excitatory and inhibitory neurons preserved their respective positive and negative synaptic projections. To illustrate how the weight structure evolved during training, we explicitly reference Figure 2A, which visualizes the final mean inter-cluster synaptic weights and highlights the strong recurrent connectivity that emerged within Cluster 1. Regarding Equations 2 and 3 and their constants, we clarified that the target escape times used to anchor the network were based on experimentally measured behavioral latencies (48.7 s for the CS-present condition and 111.3 s for the CS-absent condition). Furthermore, the regularization coefficients (λ = 0.01 and λ<sub>FR</sub> = 0.95) were selected through a grid search procedure to maintain biologically plausible firing rates while preventing overfitting.

      We detailed the surgical procedures that were previously omitted. This includes the specific anesthesia protocol (sodium pentobarbital, 50 mg/kg, i.p.), stereotaxic mounting, and the exact coordinates for the rsCla (AP +2.95, ML ±1.95, DV -3.85 mm). To define "sparse expression," we specified that the AAV was diluted 1:4 in sterile saline. Finally, we included the precise injection parameters: delivery at 20 nL/min via a pressure injection system, with the pipette left in place for 10 minutes post-infusion to ensure adequate diffusion. (Line 635, 636-639, 641-643). We have added these contents in the Methods section. 

      (3) The explorations of model behavior are a catalog of everything tried rather than an organized demonstration of what the model can and cannot do. The figures could be reduced in number to emphasize the key comparisons of the different clusters and the model's behavior under different conditions, intended to "test" the model.

      We agree with the reviewer’s comment and have reorganized the figures to focus on the key results. Specifically, we separated the original figures so that they correspond to (1) Presentation of an RNN model consistent with the results of actual claustral recordings, (2) identification of dimensionality-reduced population activity patterns in the model, (3) comparison of these patterns with population activity patterns derived from recorded claustral neurons, (4) proposal of a nonlinear integration mechanism, and (5) the suggestion that such integration may be implemented through dynamic coding. Using this figure organization, we first identify RNN models trained on behavioral metrics whose dynamics are consistent with experimental claustral recordings. We then compare the dimensionality-reduced population activity patterns of these models with those derived from recorded claustral neurons to evaluate their biological plausibility. After selecting the models that satisfy this criterion, we perform further analyses that would be difficult to achieve using real neural recordings alone. These analyses ultimately allow us to propose dynamic coding exhibiting nonlinear integration as a plausible computational mechanism.

      (4) On page 6, the E-E connectivity is argued from Shelton et al. (2025) and against Kim et al. (2016), but ignores Orman (2015), which, to this reviewer's knowledge, was the first to demonstrate such connectivity, including the long-duration events and impact of planes of section.

      We agree with the reviewer’s suggestion and will include a reference to Orman (2015). We have clarified that neuronal activity can persist for extended periods and that such persistent activity has been observed in claustral slices prepared at a specific slicing angle (Line 144).

      (5) Whereas the authors are entitled to their own opinion of prior work (references 3-8), it is inappropriate to misrepresent prior work as only demonstrating a "limited function" of claustrum. Additional papers by Mathur's group and Citri's group are ignored.

      We agree with the reviewer’s comment and have revised the relevant sentences in the Introduction section.  We also included and acknowledged the contributions of previous studies by the Mathur group and the Citri group by adding additional references to their works (Line 36, 429).

      In summary, the authors have made a computational model that recapitulates the firing of a subset of potentially claustral neurons during a particular behavioral task (delayed escape is certainly not the only behavior that involves claustrum - see e.g., attention, salience, sleep). If the conclusion is that excitatory claustral cells must be connected to other excitatory claustral cells, such a conclusion is not new, and the electrophysiological E-E metrics are not well quantified (e.g., connectivity frequency, strength of connection). If the model is intended to predict how the claustrum might accomplish any other task, there is insufficient detail to evaluate the model beyond the evidence that the model creates a subset of cells that can sustain firing during the delay period in the delayed escape task.

      All relevant work must be appropriately cited throughout the manuscript.

      Regarding the E–E metric, we obtained the following result. When including recordings in which the whole-cell recording could not be completed, optogenetically evoked responses were observed in 38 out of 43 patched cells. This suggests that approximately 90% of the cells receive intra-claustral excitatory input. However, the current dataset does not allow us to quantify the connection probability or the strength of these connections.

      As the reviewer pointed out, the RNN developed in this study is specifically designed for the delayed escape task, and we do not intend to claim direct generalization to other proposed functions of the claustrum, such as attention, salience, or sleep. The goal of this study is to computationally characterize the temporal integration mechanism of the claustrum observed in this specific task. We have included this in the Discussion section. In the second paragraph of the Discussion, we have explicitly acknowledged the concerns raised by the reviewer and outlined how they have been addressed in the revised manuscript.

    1. eLife Assessment

      This important advancement in the field of neurotransmission delivers a novel toolkit for in vivo visualization of vesicular transporters for ACh, GABA, glutamate and monoamines in C. elegans. With the application of newly developed neuron-specific knockout methods for these vesicular transporters, the results convincingly demonstrate that over 10% of the neurons studied show transporter co-expression that may be correlated with co-transmission. These findings and toolkit will be of interest towards the study of neural circuit function.

    2. Reviewer #1 (Public review):

      Summary:

      This study presents a novel toolkit for visualizing and manipulating neurotransmitter-specific vesicles in C. elegans neurons, addressing the challenge of tracking neurotransmitter dynamics at the level of individual synapses. The authors engineered endogenously tagged vesicular transporters for glutamate, GABA, acetylcholine, and monoamines, enabling cell-specific labeling while maintaining physiological function. Additionally, they developed conditional knockout strains to disrupt neurotransmitter synthesis in single neurons. The study reveals that over 10% of neurons in C. elegans exhibit co-transmission, with a detailed case study on the ADF sensory neuron, where serotonin and acetylcholine are trafficked in distinct vesicle pools. The approach provides a powerful platform for studying neurotransmitter identity, synaptic architecture, and co-transmission.

      Strengths:

      (1) This toolkit offers a generalizable framework that can be applied to other model organisms, advancing the ability to investigate synaptic plasticity and neural circuit logic with molecular precision.

      (2) The use of this toolkit, the authors uncover molecular heterogeneity at individual synapses, revealing co-transmission in over 10% of neurons, and offers new insights into neurotransmitter trafficking and synaptic plasticity, advancing our understanding of synaptic organization.

      Weaknesses:

      (1) While the article introduces valuable tools for visualizing neurotransmitter vesicles in vivo, the core techniques are based on previously established methods. The study does not present significant technological breakthroughs, limiting the novelty of the methodological advancements.

      (2) The article does not fully explore the potential implications or the underlying mechanisms governing this process, while the discovery of co-transmission in over 10% of neurons is an intriguing finding. A deeper investigation into the functional uniqueness and interactions of neurotransmitters released from individual co-transmitting neurons-perhaps through case study example-would strengthen the study's impact.

      Comments on revisions:

      I have no further questions regarding this work. I would like to congratulate the authors on the forthcoming publication of their manuscript. This study presents a versatile methodological framework with strong potential to advance the field of neuroscience, particularly in dissecting neural circuit function and neurotransmission dynamics in vivo.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors developed fluorescent reporters to visualize the subcellular localization of vesicular transporters for glutamate, GABA, acetylcholine, and monoamines in vivo. They also developed cell-specific knockout methods for these vesicular transporters. To my knowledge, this is the first comprehensive toolkit to label and ablate vesicular transporters in C. elegans. They carefully and strategically designed the reporters, and clearly explained the rationale behind their construct designs. Meanwhile, they used previously established functional assays to confirm that the reporters are functional. They also tested and confirmed the effect of cell-specific and pan-neuronal knockout of several of these transporters.

      Strengths:

      The tools developed are versatile: they generated both green and red fluorescent reporters for easy combination with other reporters; they established the method for cell-type specific KO to analyze function of the neurotransmitter in different cell types. The reagents allow visualization of specific synapses among other processes and cell bodies. In addition, they also developed a binary expression method to detect co-transmission "We reasoned that if two neurotransmitters were co-expressed in the same neuron, driving Flippase under the promoter of one transmitter would activate the conditional reporter-resulting in fluorescence-only in cells also expressing a second neurotransmitter identity". Overall, this is a versatile and valuable toolkit with well-designed and carefully validated reagents. This toolkit will likely be widely used by the C. elegans community.

      Comments on revisions:

      The authors addressed my questions in the revised manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Cuentas-Condori et al. generate cell-specific tools for visualizing the endogenous expression of, as well as knocking out, four different classes of neurotransmitter vesicular transporters (glutamatergic, cholinergic, gabaergic and monoaminergic) in C. elegans. They then use these tools in an intersectional strategy to provide evidence for the co-expression of these transporters in individual neurons, suggesting co-transmission of the associated neurotransmitters.

      Strengths:

      A major strength of the work is the generation of several endogenous tools that will be of use to the community. Additionally, this adds to accumulating evidence of co-transmission of different classes of neurotransmitters in the nervous system.

      Another strength is the comparison to previously published single cell sequencing data and other previously published data.

      Weaknesses:

      Co-expression of these transporters is not in and of itself sufficient to establish neurotransmitter co-release, but this caveat is acknowledged by the authors.

      Comments on revisions:

      The authors have addressed all of my previous concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study presents a novel toolkit for visualizing and manipulating neurotransmitterspecific vesicles in C. elegans neurons, addressing the challenge of tracking neurotransmitter dynamics at the level of individual synapses. The authors engineered endogenously tagged vesicular transporters for glutamate, GABA, acetylcholine, and monoamines, enabling cell-specific labeling while maintaining physiological function. Additionally, they developed conditional knockout strains to disrupt neurotransmitter synthesis in single neurons. The study reveals that over 10% of neurons in C. elegans exhibit co-transmission, with a detailed case study on the ADF sensory neuron, where serotonin and acetylcholine are trafficked in distinct vesicle pools. The approach provides a powerful platform for studying neurotransmitter identity, synaptic architecture, and co-transmission.

      Strengths:

      (1) This toolkit offers a generalizable framework that can be applied to other model organisms, advancing the ability to investigate synaptic plasticity and neural circuit logic with molecular precision.

      (2) Through the use of this toolkit, the authors uncover molecular heterogeneity at individual synapses, revealing co-transmission in over 10% of neurons, and offer new insights into neurotransmitter trafficking and synaptic plasticity, advancing our understanding of synaptic organization.

      Weaknesses:

      (1) While the article introduces valuable tools for visualizing neurotransmitter vesicles in vivo, the core techniques are based on previously established methods. The study does not present significant technological breakthroughs, limiting the novelty of the methodological advancements.

      The reviewer is correct that this study does not introduce fundamentally new molecular or imaging techniques. Rather, the goal of this work is to establish a generalizable and experimentally validated framework for investigating neurotransmission in vivo at single-cell resolution. To achieve this, we deliberately integrate robust and well-established approaches, including CRISPR-based genome engineering, endogenous tagging, intersectional labeling strategies, and behavioral genetics, into a unified toolkit that enables questions that were previously difficult to address in intact animals.

      The novelty of the work therefore lies not in the invention of individual technologies, but in their systematic integration, functional validation, and deployment to reveal new biological insights, such as the prevalence and spatial organization of co-transmission in vivo.

      (2) The article does not fully explore the potential implications or the underlying mechanisms governing this process, while the discovery of co-transmission in over 10% of neurons is an intriguing finding. A deeper investigation into the functional uniqueness and interactions of neurotransmitters released from individual co-transmitting neurons - perhaps through case study examples - would strengthen the study's impact.

      We agree with the reviewer that this study does not exhaustively explore the functional implications or mechanisms of co-transmission. The primary goal of this work is to introduce and share a validated set of strains that enable monitoring and cell-specific disruption of the major neurotransmitter systems in C. elegans, using molecular components that are broadly conserved across species. By establishing this toolkit, we aim to enable the mechanistic, single-cell analyses of co-transmitting neurons that extend beyond the scope of the present study but represent important next steps for the field.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors developed fluorescent reporters to visualize the subcellular localization of vesicular transporters for glutamate, GABA, acetylcholine, and monoamines in vivo. They also developed cell-specific knockout methods for these vesicular transporters. To my knowledge, this is the first comprehensive toolkit to label and ablate vesicular transporters in C. elegans. They carefully and strategically designed the reporters and clearly explained the rationale behind their construct designs. Meanwhile, they used previously established functional assays to confirm that the reporters are functional. They also tested and confirmed the effect of cell-specific and pan-neuronal knockout of several of these transporters.

      Strengths:

      The tools developed are versatile: they generated both green and red fluorescent reporters for easy combination with other reporters; they established the method for cell-typespecific KO to analyze the function of the neurotransmitter in different cell types. The reagents allow visualization of specific synapses among other processes and cell bodies. In addition, they also developed a binary expression method to detect co-transmission "We reasoned that if two neurotransmitters were co-expressed in the same neuron, driving Flippase under the promoter of one transmitter would activate the conditional reporter - resulting in fluorescence - only in cells also expressing a second neurotransmitter identity". Overall, this is a versatile and valuable toolkit with well-designed and carefully validated reagents. This toolkit will likely be widely used by the C. elegans community.

      Weaknesses:

      The authors evaluated the positions of fluorescent puncta by visually comparing their positions with the positions of synapses indicated by EM reconstruction. It would provide stronger supportive evidence if the authors also examined co-localization of these reporters with well-established synaptic reporters previously published by their lab, such as reporters that label presynaptic sites of AIY interneurons.

      We have now included images of the synaptic vesicle marker RAB-3 in neurons like ASE (new Figure S2) and RIB (new Figure S4D). We mention in the text that the patterns observed with VGLUT/EAT-4 (in Figure 2E) and VGAT/UNC-47 (Figure 3D) are like those observed in the Rab3 images (Figure S2 and S4D, now discussed in lines 180-182 and line 244, respectively), supporting labeling of presynaptic vesicles.

      Additionally, we now show that in the ADF neuron, a mutant for the conserved presynaptic kinesin KIF1A, results in the accumulation of VACh/UNC-17 and VMAT/CAT-1 in the cell soma and the elimination of the signal from the ADF axon (new Figure 7D-D’). These results are also consistent with the idea that these labeled transporters localize to synaptic vesicles that fail to be transported into the axon in the absence of a functional KIF1A/UNC-104 protein (lines 408-411).

      This toolkit will likely be widely used by the C. elegans community. To facilitate the adoption of the approach and method by worm labs, the authors should include their plan for the dissemination of all of the reagents included in the kit, along with all of the associated information, including construct sequences and the protocols for their use.

      We thank the reviewer or this suggestion, and in response we now: (1) have deposited all strains that we developed in this study to the Caenorhabditis Genetics Center, (2) have created a public website with sequences and genotyping information for each allele developed (https://www.intralab.app/research-papers/cuentas-condori_etal-2026) and(3) have named the tool kit, SynaptoTagMe, and included the name in the title and in the text. We also added the information of the public website to the main text (lines 140-142) and methods section (lines 540-542).

      Reviewer #3 (Public review):

      Summary:

      Cuentas-Condori et al. generate cell-specific tools for visualizing the endogenous expression of, as well as knocking out, four different classes of neurotransmitter vesicular transporters (glutamatergic, cholinergic, GABAergic, and monoaminergic) in C. elegans. They then use these tools in an intersectional strategy to provide evidence for the coexpression of these transporters in individual neurons, suggesting co-transmission of the associated neurotransmitters.

      Strengths:

      A major strength of the work is the generation of several endogenous tools that will be of use to the community. Additionally, this adds to accumulating evidence of co-transmission of different classes of neurotransmitters in the nervous system.

      Weaknesses:

      A weakness of the study is a lack of comparison to previously published single-cell sequencing data. These tools are alternatively described in the manuscript as superior to the sequencing data and as validation of the sequencing data, but neither claim can be assessed without knowing how they compare and contrast to that data. It is thus not clear to what extent the conclusions of this paper are an advance over what could be determined from the sequencing data on its own. Finally, some technical considerations should be discussed as potential caveats to the robustness of their intersectional strategy for concluding that certain genes are indeed co-expressed. Overall, claims about cotransmission should be tempered by the caveats presented in the discussion, suggesting that co-expression of these transporters is not in and of itself sufficient for neurotransmitter release.

      To clarify, we do not claim that our tools are superior to single-cell sequencing data. Rather, we view the characterization of neurotransmitter identity as an iterative process of discovery and validation across complementary approaches. Moreover, while this study provides an additional lens through which to examine neurotransmitter identity, its primary advance is not in redefining transmitter identity per se, but in establishing a toolkit that enables direct, in vivo monitoring and manipulation of neurotransmitter use at single-cell resolution.

      We do agree on the importance of explicitly comparing our findings with prior studies. In the revised manuscript we have therefore strengthened this integration by:

      (1) Revising Figure S9 and its legend to indicate the source of information for each neuron;

      (2) Adding a new Table 3 summarizing neurons consistently reported to have co-transmission potential;

      (3) Adding a new Table 4 listing neurons previously suggested to be co-transmitter neurons but not consistently supported across datasets;

      (4) Revising the Results to clarify these comparisons (lines 372-374 and 381-383); and

      (5) Incorporating this discussion into the main text (lines 482–488).

      In the Discussion we also now acknowledge technical caveats of the intersectional strategy, emphasizing that co-expression of vesicular transporters indicates co-transmission potential but is not, on its own, sufficient evidence of functional co-release (lines 482–488).

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) The design of different recombination sites for the transporters is a key strength of this paper. While the authors have provided justification and validation for the chosen sites, it would be valuable to know whether alternative insertion sites were tested as controls. A comparative analysis of multiple sites would provide important insights, especially for the design of similar sites in other proteins or in mammalian systems.

      Our paper lists all the sites tested for labeling each synaptic vesicle transporter. To summarize this information, we have added Table 5 in the Methods section (line 591).

      (2) Given the endogenous nature of the transporter design, it would be interesting to know if the authors have observed dynamic vesicle trafficking to explain the partial overlap shown in Figure 7. A dynamic approach could better capture the potential synergism and heterogeneity of co-transmission. I recommend that the authors try time-lapse imaging to explore this dynamic process further.

      We agree that dynamic imaging approaches, including time-lapse analysis of vesicle trafficking, represent an exciting avenue to further investigate the spatial and temporal organization of co-transmission. Such experiments are part of ongoing work in our laboratory and will be the focus of future studies aimed at dissecting the dynamic regulation of transmitter-specific vesicle populations in vivo.

      (3) The paper identifies co-transmission across a significant proportion of neurons, but the functional implications and interactions of neurotransmitters released from individual cotransmitting neurons are not fully explored. A case study focusing on the uniqueness and interactions of neurotransmitter release in these neurons would provide further clarity on the biological relevance of co-transmission.

      We agree with the reviewer on the importance of dissecting the functional implications of co-transmission and understanding how different neurotransmitters interact within individual co-transmitting neurons in vivo. The primary goal of this study is to establish and share tools that enable such investigations, and we anticipate that future work, using these reagents, will examine the functional roles of co-transmission on a neuron-by-neuron basis in the future.

      (4) Minor Comments:

      (a) Figure S1D: The label "eat-4" in the eat-4::GFP image appears in italics.

      We have corrected this.

      (b) Figure 2C: The figure legend is missing the statistical significance notation (*** p).

      We have corrected this.

      (c) Figure 2D: The scale bar should be labeled as 10 μm.

      We have added the label.

      (d) Figure S4B: The image quality could be improved for better clarity.

      We have replaced the image.

      (e) Figure S8: The figure legend formatting needs attention, and the scale bar is missing in Figure S8C.

      We have added panel labels and the scale bar.

      Reviewer #3 (Recommendations for the authors):

      (1) A comparison of the results generated in this paper to the Cengen data (or other previously published data) would greatly strengthen the paper. Figure S7 seems to be a compilation of several different data sets, but this is very unclear if so, and there is no indication of which neurons are from which data, and whether there is any conflicting evidence (or what cutoffs were used to determine co-expression from Cengen). If there are indeed conflicting results, the ramifications should be discussed. Finally, given the caveat introduced in the discussion regarding the I2 neuron not expressing GABA synthesis or reuptake machinery, a more thorough analysis of which neurons identified here do or don't express other relevant genes may be warranted.

      In the revised version, we have added Tables 3 and 4 to explicitly compare our findings with CeNGEN and prior studies. Table 3 lists neurons consistently reported across independent datasets to have co-transmission potential, while Table 4 highlights neurons that have been suggested, but not consistently supported, across studies. We now also provide explicit references for each neuron in these tables and have clarified data sources and annotations in the legend to Figure S7 (now Figure S9). These additions are intended to make points of agreement and discrepancy across datasets transparent and to better contextualize our findings within existing resources.

      (2) The intersectional strategy used to identify co-expression of different transporters has some caveats that should be discussed. Specifically, removing the entire open reading frame of the eat-4 gene (as opposed to employing a T2A strategy) could potentially also remove some negative regulatory elements (for example, located within introns), leading to the inappropriate expression of the fluorescent reporter. This should at least be mentioned as a potential caveat.

      We have added this caveat into the discussion section (lines 511-513).

      (3) The colocalization experiments performed in Figure 7 seem to rely on the use of a transgenic allele (syb7882) that was not previously validated for functionality. This is only a problem because: a) another allele with a constitutive mRuby in the same position (ot907) did not seem to be fully functional in the thrashing assays (Figure S4F), and thus it is at least conceivable that the differences in localization are due to the non-functional transporters being relegated to compartments destined for degradation. Validating this strain (after panneuronal Flippase expression) in the thrashing assay would dispel this concern.

      We have performed thrashing assays with allele syb7882 (UNC-17::mRuby3 GLP-on) (new Figure S6), in which we find that labeling UNC-17 with C. elegans-optimized mRuby3 (driven by pan-cellular Flippase) results in animals whose thrashing behavior is indistinguishable from that of wild-type animals. This result is consistent with the idea that the distinct subsynaptic localizations observed between VMAT/CAT-1 and VAChT/UNC-17 in ADF neurons arise from endogenous cellular subsynaptic organization programs.

      We additionally note that allele ot907 labels UNC-17 with mKate2, not mRuby3, and that this allele is different from wild type animals in a thrashing assay (Figure S5F). The syb7882 allele that we generated labels UNC-17 with mRuby3 and is not different from wild type in a thrashing assay. We are unsure as to these distinct phenotypes between ot907 and syb7882, but note that in addition to the use of different fluorescent proteins, each allele also employs distinct linker sequences between UNC-17 and the fluorescent protein (new Figure S6). We now explain this difference in the figure legend of Figure S5 (lines 1184-1189).

      Minor comments:

      (1) Is there a difference between the strains imaged in Figures 3D and S3D? If so, this is not clear. If not, why are they shown twice, and why do they look so different from each other?

      We have replaced panel S3D with an endogenous RAB-3::mScarlet marker in RIB neurons to show that the localization of this synaptic vesicle marker parallels the punctated pattern of UNC-47::gfp11x3 reconstituted specifically in RIB neurons. See new panel S4D and line 244.

      But to explain, GFP1-10 is expressed with an extrachromosomal array, which drives variable expression of the array and can explain the difference.

      (2) Strains are alternatively denoted by their effect in the main figures, and by their allele names in the supplementary figures. This can be confusing when trying to compare data between the two figures (e.g., Figures 4C and S4F). Perhaps adding the allele names as parentheticals in the main figure might help.

      We have modified the paper to include the name of the alleles used in the panels of the main figures. Additionally, we now mention the specific alleles used for the functional assays in the figure legends.

      (3) To better understand the ramifications and efficiency of the cat-1 FLP-mediated removal (Figure 5E), it would be interesting to compare it directly to the ADF-specific removal of tph-1 referenced in the text.

      We agree that a direct comparison between the FLP-mediated removal of cat-1 and ADFspecific removal of tph-1 would be informative for assessing the efficiency and functional consequences of these manipulations. These experiments represent an interesting direction for future work, and we plan to pursue such comparisons in subsequent studies.

      (4) ADF seems to express very low levels of cho-1 (reuptake transporter), based on the images in Figure S8. Does it express higher levels of cha-1 (synthesis)?

      We have not directly compared the relative expression levels of cho-1 and cha-1 in ADF neurons in this study. Such quantitative comparisons of synthesis and reuptake machinery represent an interesting direction for future work but fall beyond the scope of the present manuscript.

    1. eLife Assessment

      In this important study, the authors used a zebrafish model and scRNAseq analysis to show that a subset of keratinocytes within melanoma microenvironment highly up-regulate Twist and undergo Epithelial-Mesenchymal Transition (EMT). Surprisingly, when overexpressing Twist in keratinocytes, the resulting alteration in keratinocytes is inhibitory for melanoma invasion in both zebrafish and human cell culture models. The results are supported by convincing experimental data that provide new insights into the interactions between melanoma cells and their environment.

    2. Reviewer #1 (Public review):

      Summary:

      Ma et al. show that melanoma cells induce an EMT-like state in nearby keratinocytes and that when this state is induced experimentally by Twist-overexpression the resulting alteration in keratinocytes is inhibitory for melanoma invasion. These conclusions are based on experiments in vivo with zebrafish and, in vitro, with human cells. The work is carefully done and provides new insights into the interactions between melanoma cells and their environment.

      Strengths:

      Use of both zebrafish and human cells adds confidence that findings are relevant to human melanomas while also further demonstrating utility of the zebrafish system for discovering important new features of melanoma biology that could ultimately have clinical impacts. The work also combines a nice suite of approaches including different models for induced melanomagenesis in zebrafish, single cell RNA-sequencing, and more. Some of the final observations are intriguing as well, especially the possibility of EMT induced melanocyte-keratinocyte interactions via Jam3 expression; it will be interesting to see if these is indeed a mechanism for restraining melanoma invasion. The paper is clearly written and the inferences appropriate for the results obtained. Overall the work makes a solid contribution to our understanding of important, but too often neglected, roles of the tumor microenvironment in promoting or inhibiting tumor progression and outcome.

      Weaknesses:

      No critical weaknesses noted.

      Comments on revisions:

      The authors have adequately addressed my comments and concerns.

    3. Reviewer #2 (Public review):

      Summary:

      Manuscript by Ma et. al. utilizes a zebrafish melanoma model, single-cell RNA sequencing (scRNA-seq), a mammalian in vitro co-culture system, and quantitative PCR (Q-PCR) gene expression analysis to investigate the role keratinocytes might play within the melanoma microenvironment. Convincing evidence is presented from scRNA-seq analysis showing that a small cluster of melanoma-associated keratinocytes upregulate the master EMT regulator, transcription factor, Twist1a. To investigate how Twist-expressing keratinocytes might influence melanoma development, the authors use an in vivo zebrafish model to induce melanoma initiation while overexpressing Twist in keratinocytes through somatic transgene expression. This approach reveals that Twist overexpression in keratinocytes suppresses invasive melanoma growth. Using a complementary in vitro human cell line co-culture model, the authors demonstrate reduced migration of melanoma cells into the keratinocyte monolayer when keratinocytes overexpress Twist. Further scRNA-seq analysis of zebrafish melanoma tissues reveal that, in the presence of Twist-expressing keratinocytes, subpopulations of melanoma cells show altered gene expression, with one unique melanoma cell cluster appearing more terminally differentiated. The authors use computational methods to predict putative receptor-ligand pairs that might mediate the interaction between Twist-expressing keratinocytes and melanoma cells. Finally the authors established that similar keratinocyte phentypical changes also occurs in human melanoma tissues, setting a scene for future clinically relevant studies.

      Strengths:

      The scRNA-seq approach reveals a small proportion of keratinocytes undergoing EMT within melanoma tissue. The use of a zebrafish somatic transgenic model to study melanoma initiation and progression provides an opportunity to manipulate host cells within the melanoma microenvironment and evaluate their impact on tumour progression. Solid data demonstrate that Twist-expressing keratinocytes can constrain melanoma invasive development in vivo and reduce melanoma cell migration in vitro, establishing that Twist-overexpressing keratinocytes can suppress at least one aspect of tumour progression. Using GeoMX spatial transcriptomics platform to interrogate a series of early melanoma precursor lesions, enabled the authors to demonstrate similar EMT phenotype in keratinocytes also occurs in humans.

      Weaknesses:

      Due to limitations of the current model, no EMT marker gene expression was examined in melanoma tissue sections to determine the proportion and localization of Twist+ve keratinocytes within the melanoma microenvironment. However the authors compensated this through using spatial transcriptomics platform to interrogate a series of early melanoma precursor lesions in humans.

      Due to technical limitations, it remain to be determined whether blocking EMT through down-regulation of Twist in keratinocytes may influence melanoma development.

      Due to technical limitations, none of the gene expression changes detected through Q-PCR or scRNA-seq were examined using immunostaining or in situ hybridization, hence cellular resolution spatial information is lacking.

      Overall, the data presented in this report draw attention to a less-studied host cell type within the tumour microenvironment, the keratinocytes, which, similar to well-studied immune cells and fibroblasts, could play important roles in either promoting or constraining melanoma development. Counterintuitively, the authors show that Twist-expressing EMT keratinocytes can constrain melanoma progression. While the detailed mechanisms remain to be uncovered, this is an exciting new line of research that warrant future studies.

      Comments on revisions:

      The authors have provided additional evidence to support their original conclusions, and the inclusion of spatial transcriptomic analysis using human samples strengthens the study. I did not identify any further issues that require attention.

    4. Reviewer #3 (Public review):

      Summary:

      In this study the authors use the zebrafish model and in vitro co-cultures with human cell lines, to study how keratinocytes modulate the early stages of melanoma development/migration. The authors demonstrate that keratinocytes undergo an EMT-like transformation in the presence of melanoma cells which lead to a reduction in melanoma cell migration. This EMT transformation occurs via Twist; and resulted in an improvement in OS in zebrafish melanoma models. Authors suggest that the limitation of melanoma cell migration by Twist-overexpressing keratinocytes was through altered cell-cell interactions (Jam3b) that caused a physical blockage of melanoma cell migration.

      Strengths:

      Authors describe a new cross-talk between melanoma and its major initial microenvironment: the keratinocytes and how instructed by melanoma cells keratinocytes undergo an EMT transformation, which then controls melanoma migration.<br /> Overall, the paper is very well written, and the results are clearly organized and presented.

      Weaknesses:

      (1) To really show their last point it would be important to CRISPR KO Jam3b in melanoma with twist OE keratinocytes, in vivo or in vitro.

      (2) Use of patient biopsies from early-stage melanomas vs healthy tissue to assess if there is a similar alteration of morphology of adjacent keratinocytes and increase in vimentin in human samples would strengthen the author's findings.

      (3) Characterise better the cell-cell junctions and borders between cells (melanoma/ keratinocytes) with cellular and sub-cellular resolution. Since melanocytes can "touch" with their dendrites ~40 keratinocytes - can authors expand and explain better their model? Can this explain that in some images we cannot observe a direct interface between the cells?

      Comments on revisions:

      The authors answered most of the concerns raised.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Ma et al. show that melanoma cells induce an EMT-like state in nearby keratinocytes and that when this state is induced experimentally by Twist-overexpression the resulting alteration in keratinocytes is inhibitory for melanoma invasion. These conclusions are based on experiments in vivo with zebrafish and, in vitro, with human cells. The work is carefully done and provides new insights into the interactions between melanoma cells and their environment.

      We appreciate your support for our overall conclusions.

      Strengths:

      The use of both zebrafish and human cells adds confidence that findings are relevant to human melanomas while also further demonstrating the utility of the zebrafish system for discovering important new features of melanoma biology that could ultimately have clinical impacts. The work also combines a nice suite of approaches including different models for induced melanomagenesis in zebrafish, single-cell RNA-sequencing, and more. Some of the final observations are intriguing as well, especially the possibility of EMT-induced melanocyte-keratinocyte interactions via Jam3 expression; it will be interesting to see if this is indeed a mechanism for restraining melanoma invasion. The paper is clearly written and the inferences are appropriate for the results obtained. Overall the work makes a solid contribution to our understanding of important, but too often neglected, roles of the tumor microenvironment in promoting or inhibiting tumor progression and outcome.

      Weaknesses:

      No critical weaknesses were noted.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Ma et. al. utilizes a zebrafish melanoma model, single-cell RNA sequencing (scRNA-seq), a mammalian in vitro co-culture system, and quantitative PCR (Q-PCR) gene expression analysis to investigate the role keratinocytes might play within the melanoma microenvironment. Convincing evidence is presented from scRNA-seq analysis showing that a small cluster of melanoma-associated keratinocytes upregulates the master EMT regulator, transcription factor, Twist1a. To investigate how Twist-expressing keratinocytes might influence melanoma development, the authors use an in vivo zebrafish model to induce melanoma initiation while overexpressing Twist in keratinocytes through somatic transgene expression. This approach reveals that Twist overexpression in keratinocytes suppresses invasive melanoma growth. Using a complementary in vitro human cell line co-culture model, the authors demonstrate reduced migration of melanoma cells into the keratinocyte monolayer when keratinocytes overexpress Twist. Further scRNA-seq analysis of zebrafish melanoma tissues reveals that in the presence of Twist-expressing keratinocytes, subpopulations of melanoma cells show altered gene expression, with one unique melanoma cell cluster appearing more terminally differentiated. Finally, the authors use computational methods to predict putative receptor-ligand pairs that might mediate the interaction between Twist-expressing keratinocytes and melanoma cells.

      Strengths:

      The scRNA-seq approach reveals a small proportion of keratinocytes undergoing EMT within melanoma tissue. The use of a zebrafish somatic transgenic model to study melanoma initiation and progression provides an opportunity to manipulate host cells within the melanoma microenvironment and evaluate their impact on tumour progression. Solid data demonstrate that Twist-expressing keratinocytes can constrain melanoma invasive development in vivo and reduce melanoma cell migration in vitro, establishing that Twist-overexpressing keratinocytes can suppress at least one aspect of tumour progression.

      Weaknesses:

      While the scRNA-seq analysis of melanoma tissue and RT-PCR analysis of EMT gene expression in isolated keratinocytes provide evidence that a subpopulation of host keratinocytes upregulates Twist and other EMT marker genes and potentially undergoes EMT, the in vivo evidence for keratinocyte EMT within the melanoma microenvironment is based on cell morphology in a single image without detailed characterization and quantification. No EMT marker gene expression was examined in melanoma tissue sections to determine the proportion and localization of Twist+ve keratinocytes within the melanoma microenvironment.

      We agree this needed better support. To address this, we have collaborated with the Sorger lab who has performed Spatial Transcriptomics on early human melanoma samples (n=8 samples). The advantage of this method is that they can dissect microregions of interest (MRs) RNA-seq to discern keratinocytes vs. melanocytes. We queried regions that had higher or lower numbers of atypical melanocytes in these biopsies with our TAK or TWIST signature. While the normal sample had no enrichment, we found that a subset of the human samples had evidence of these signatures in the keratinocytes, particularly the ones which had a higher proportion of atypical melanocytes. These data support our model that early melanomas enact an EMT like program in a subset of nearby keratinocytes.

      The scRNA-seq UMAP suggests the proportion of EMT keratinocytes within the melanoma microenvironment is very small, raising questions about their precise location and significance within the tumour microenvironment. Although both in vivo and in vitro evidence demonstrates that Twist-expressing keratinocytes can suppress melanoma progression, the conditions modelled by the authors involve over-expression of Twist in all keratinocytes, which do not naturally occur within the melanoma microenvironment and, therefore, might not be relevant to naturally occurring melanoma progression. The author did not test whether blocking EMT through down-regulation of Twist in keratinocytes may influence melanoma development, which would establish the role of Twist expression keratinocytes in the melanoma microenvironment.

      We entirely agree, and ideally would do the exact experiment you suggested, which is to knockout TWIST in the keratinocytes using CRISPR and see how this affects the tumor phenotype. However, despite our best efforts, we do not yet have an efficient method for performing knockouts in the tumor microenvironment. If we used standard 1-cell embryo transgenic approaches with a krt4-Cas9, this would severely disrupt skin development in the whole animal, and would be viable. Theoretically, we could do this with TEAZ, but we have found that the expression of Cas9 in the microenvironment (i.e. under a krt4 promoter) is relatively inefficient. For example, we tried a krt4-Cas9 coupled with an sgRNA against GFP (as a test of the system) and this did not work well. Thus, a major goal for future studies is to develop a technology that would allow us to do this exact experiment. Finally, we do not have enough cells present in the sections to answer the question of whether the EMT keratinocytes are associated with certain melanoma cell states (i.e. proliferative, invasive), although we agree this would be an important question for future studies.

      To address the potential mechanism by which Twist-expressing keratinocytes suppress melanoma progression, a second scRNA-seq analysis was conducted. However, this analysis is not adequately presented to provide strong evidence for proposed mechanisms for how Twist-expressing keratinocytes suppress melanoma cell invasion. CellChat analysis was used to attempt to identify receptor-ligand pairs that might mediate keratinocyte-melanoma cell interaction, but the interactions between tumour-associated keratinocytes (TAK) and melanoma cells were not included in the analysis. Furthermore, although genetic reporters were used to label both keratinocytes and melanoma cells, no images showing the detailed distribution and positional information of these cells within melanoma tissue are presented in the report. None of the gene expression changes detected through Q-PCR or scRNA-seq were validated using immunostaining or in situ hybridization.

      As noted above, we have now added human biopsy samples from the Sorger lab to our analysis, showing that the TAK/TWIST keratinocytes occur directly adjacent to the atypical melanocytes in these samples. While these early melanomas are quite difficult to obtain (most samples are used for diagnostic purposes), this provides further support to our zebrafish models.

      Overall, the data presented in this report draw attention to a less-studied host cell type within the tumour microenvironment, the keratinocytes, which, similar to well-studied immune cells and fibroblasts, could play important roles in either promoting or constraining melanoma development.

      Counterintuitively, the authors show that Twist-expressing EMT keratinocytes can constrain melanoma progression. While the detailed mechanisms remain to be uncovered, this is an interesting observation.

      Reviewer #3 (Public review):

      Summary:

      In this study the authors use the zebrafish model and in vitro co-cultures with human cell lines, to study how keratinocytes modulate the early stages of melanoma development/migration. The authors demonstrate that keratinocytes undergo an EMT-like transformation in the presence of melanoma cells which leads to a reduction in melanoma cell migration. This EMT transformation occurs via Twist; and resulted in an improvement in OS in zebrafish melanoma models. Authors suggest that the limitation of melanoma cell migration by Twist-overexpressing keratinocytes was through altered cell-cell interactions (Jam3b) that caused a physical blockage of melanoma cell migration.

      Strengths:

      The authors describe a new cross-talk between melanoma and its major initial microenvironment: the keratinocytes and how instructed by melanoma cells keratinocytes undergo an EMT transformation, which then controls melanoma migration. Overall, the paper is very well written, and the results are clearly organized and presented.

      Weaknesses:

      (1) To really show their last point it would be important to CRISPR KO Jam3b in melanoma with twist OE keratinocytes, in vivo or in vitro.

      The CellChat data suggest that Jam3b is likely important in melanoma development, as it has been shown to be important in melanocyte development (Eom, Dev Biol 2021). Studying this specifically in melanoma progression is an area of ongoing study in our lab, and we have begun to generate the Jam3b knockouts as you suggested. Since this set of experiments is quite extensive, we feel this set of data deserves a separate manuscript, which we hope to complete in the near future.

      (2) The use of patient biopsies from early-stage melanomas vs healthy tissue to assess if there is a similar alteration of morphology of adjacent keratinocytes and an increase in vimentin in human samples would strengthen the author's findings.

      As noted above, we have now added human biopsy samples from the Sorger lab to our analysis, showing that the TAK/TWIST keratinocytes occur directly adjacent to the atypical melanocytes in these samples. While these early melanomas are quite difficult to obtain (most samples are used for diagnostic purposes), this provides further support to our zebrafish models.

      (3) The cell-cell junctions and borders between cells (melanoma/ keratinocytes) should be characterized better, with cellular and sub-cellular resolution. Since melanocytes can "touch" with their dendrites ~40 keratinocytes - can authors expand and explain better their model? Can this explain that in some images we cannot observe a direct interface between the cells?

      We have now added higher resolution images of these junctions. Our overall hypothesis, related to point (2) above, is that Jam3b mediates these junctions between melanoma cells and keratinocytes, which is why we are now pursuing this in a followup study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Please say a little more about any phenotypes that might have been evident inTwist-overexpression fish in the absence of melanomas, and clarify in the text that these were mosaic animals, as a first (incorrect) reading left the impression that stablelines had been made.

      In these experiments, we co-injected the melanoma plasmids along with the krt4-TWIST plasmids, creating mosaic animals. Because of this, we did not have a way of specifically looking at the effect of TWIST in the absence of melanoma. We agree this needs better clarification and have added this to the Results.

      (2) Violin plot colors in main and Supplementary Figures tend to obscure data points. Colors for keratinocyte clusters are not discernible in Figure 4C.

      We have remade the plots in a different color scheme to try and make these stand out more easily.

      (3) Clarify that N-cadherin = cdh2 in Figure 1

      We have fixed this in the legend for Figure 1.

      (4) Clarify the relationship between keratinocytes highlighted in Figure 2B and used for Hallmark expression in Figure 2B, and those analyzed for expression of candidate genes in Figure 2E. The last shows many NKC whereas whereas even the larger group circled in Figure 2B as keratinocytes seems to have far fewer cells, unless massively overplotted. Is the rest of that cluster in Fig. 2B keratinocytes as well?

      In the analysis in Figure 2E, we first calculated genes differentially expressed in the TAK vs. NKCs (found in Figure 2B). We used those genes as input into GSEA analysis, which showed enrichment for EMT programs specifically in the TAKs. We recognize that the number of TAKs is relatively small (compared to all of the other cells in the single-cell UMAP) but that is the most we were able to get from this particular scRNA run, because the melanoma cells naturally make up the vast majority of the cells in the 10X run. This is why we performed downstream mechanistic analysis (in the rest of the paper) to ensure this result was not an artifact of a small number of TAKs.

      (5) Define "NES" in the Figure 2 legend.

      NES indicates “Normalized Enrichment Score”, a standard output of GSEA. This has been added to the legend.

      (6) Indicate how many control vs. Twist+ fish were found to have invasive vs non-invasive tumors upon histological examination. Were tumors in the latter fish always contained within the epidermis proper, or did some extend deeper if given enough time?

      In the histology analysis, we used n=3 control fish and n=3 TWIST overexpressing fish. Main Figure 3 shows n=1 of these fish from each group, and the other n=2 from each is shown in Supplemental Figure 1. In this cohort (taken at 26 weeks), all of the TWIST tumors were contained within the epidermis, but we did not let them grow longer to see if (given enough time) they could have invaded below this. Around 26 weeks, the survival decreased so made this an unfeasible experiment at later time points. We have added a statement about this to the Results section.

      Reviewer #2 (Recommendations for the authors):

      Going through the data presented in the figures, here are my comments:

      (1) Figure 1: To strengthen the evidence that keratinocytes in the melanoma microenvironment undergo EMT, it would be beneficial to provide immunostaining or in situ data for EMT marker genes within melanoma tissue sections co-stained with a keratinocyte marker (such as an anti-GFP antibody).

      We agree this type of analysis is an important validation of our findings. Doing this in zebrafish tumors is difficult, as human/mouse antibodies for EMT marker genes typically do not work in fish. In addition, we felt that validating our results in human melanomas would make our findings more generalizable. Therefore, we established a collaboration with Peter Sorger’s lab, who have been performing high-resolution spatial transcriptomics on early melanoma samples from humans. While these are difficult to attain (since most early lesions are processed for clinical diagnosis) they have a collection of n=8 samples that they subjected to GeoMX spatial analysis. In this method, the samples are first stained with antibodies to definitively mark keratinocytes (PANCK) vs. melanoma cells (SOX10) and all samples are reviewed by expert pathologists. From this, microregions (MRs) of interest are selected to then undergo RNA-seq. After control analysis to ensure both keratinocytes and melanocytes were present in the samples, they then used our TAK or TWIST signatures as a query. Both signatures were enriched in the keratinocytes adjacent to early melanomas, but not in normal skin samples or in samples with few atypical melanocytes. This provides further evidence that the altered keratinocytes we see in our fish are present and enriched in human biopsy specimens.

      (2) Figure 2: In panel B, the UMAP shows the separation of single cells, and keratinocytes are circled. However, there are two clusters of keratinocytes, and the graph does not indicate which cluster represents tumour-associated keratinocytes (TAKs) versus normal keratinocytes (NKCs). The two clusters also appear to differ in abundance, so it would be helpful to report the proportion of keratinocytes that are TAKs undergoing EMT, according to the individual dots in Figure 2E. In Figure 2E,TAKs seem to have very few cells compared to the other clusters. Given the relatively small number of EMT-TAKs detected in the single-cell RNA-seq data, I wonder how much direct influence these cells could exert on the bulk of melanoma cells in vivo.The evidence would be strengthened if an IHC analysis could show the location of Twist-expressing keratinocytes within the melanoma microenvironment and whether they are associated with certain melanoma cell markers but not others (i.e., markers indicating different differentiation states of melanoma cells). To further support the role of Twist-expressing keratinocytes in the melanoma microenvironment, it would be beneficial to perform a knockout (KO) of Twist in keratinocytes within the melanoma microenvironment.

      In Figure 2B, we agree that the color scheme made it difficult to discern TAKs vs. NKCs.

      We have changed the color scheme to make this more clear.

      The number of TAKs undergoing EMT is relatively small, and this is why we performed the overexpression studies of TWIST in order to expand the field of keratinocytes undergoing EMT. To get at the question of whether these are really important in tumor initiation and progression, we ideally would do the exact experiment you suggested, which is to knockout TWIST in the keratinocytes using CRISPR and see how this affects the tumor phenotype. However, despite our best efforts, we do not yet have an efficient method for performing knockouts in the tumor microenvironment. If we used standard 1-cell embryo transgenic approaches with a krt4-Cas9, this would severely disrupt skin development in the whole animal, and would not be expected to be viable. Theoretically, we could do this with TEAZ, but we have found that the expression of Cas9 in the microenvironment (i.e. under a krt4 promoter) is relatively inefficient. For example, we tried a krt4-Cas9 coupled with an sgRNA against GFP (as a test of the system) and this did not work well. Thus, a major goal for future studies is to develop a technology that would allow us to do this exact experiment. Finally, we do not have enough cells present in the sections to answer the question of whether the EMT keratinocytes are associated with certain melanoma cell states (i.e. proliferative, invasive), although we agree this would be an important question for future studies.

      (3) Figure 4: Co-culture results show that melanoma cells migrate further on a control HaCaT cell monolayer compared to a TWIST-overexpressing HaCaT cell monolayer. While this phenotype might support the conclusion that TWIST-expressing keratinocytes reduce melanoma cell invasion, it should be interpreted with caution. The data can be interpreted as TWIST-HaCaT cells inhibiting melanoma cell migration; however, an alternative explanation cannot be ruled out. For example, wild-type HaCaT cells might provide a suitable substrate for melanoma cells to migrate, whereas TWIST-HaCaT cells lack this property. To address this, the baseline melanoma cell migration should be established in this assay by coating the plate with cells from the same melanoma cell line and allowing melanoma cells from the flipped cover slip to migrate out.

      We have performed the experiment you suggested using Hs.294T and SKMEL2 cells and provided this as a new Supplemental Figure 2. This demonstrated that the melanoma cells in this context could indeed migrate out of the coverslip at baseline. Thus, it is possible, as you indicated, that the phenotype we have observed might be due to something lacking in the TWIST keratinocytes that promotes migration. Since we cannot differentiate between these two possibilities (i.e. that TWIST KCs actively inhibit migration vs. lacking something that promotes migration), we have modified the text to indicate both of these possible mechanisms could be at play.

      (4) In the representative images shown in the figure, it appears that both HaCaT cells and melanoma cells in the upper and lower panels are at very different densities."Contact inhibition" and "cell sorting" are well-known phenomena in tissue-cultured cells, so when cells are seeded at different densities, their ability to move away from the initial location could vary. From the Materials and Methods section, it is unclear why cell densities are drastically different in the images presented. Images in the upper panel show both melanoma cells and keratinocytes at lower densities, and in the TWIST group, melanoma cells under the cover slip appear to aggregate into clusters with TWIST-expressing keratinocytes surrounding each aggregated cluster. This suggests that cell sorting might be occurring, potentially mediated by cadherins or Eph-ephrins.

      We recognized this discrepancy as well. In the setup of the experiment, we seeded the exact same number of cells for both the Hs.294T (Figure 4E) and SKMEL2 (Figure 4G) experiment. But when we took the images after 20 hours of co-culture, it was clear that the HaCat densities were different, as seen in the figures. We suspect this might be because these two melanoma cells may secrete different factors (i.e. growth factors) that impact upon HaCat proliferation, adhesion or cell sorting. Despite this, in terms of the ability of the melanoma cells to migrate into the HaCATs, we saw similar results across both experiments, suggesting that it is not HaCAT density alone that explains the results. But we agree we need to clarify this point about cell density more clearly in the manuscript, and we have amended the Discussion to indicate the above points.

      (5) Figure 5: Single-cell RNA-seq analysis comparing cells from control melanomas with cells from melanomas developed in a Twist-expressing keratinocyte background could provide valuable information on how melanoma cells alter their phenotype and how Twist-expressing keratinocytes respond to melanoma development. However, the information presented in the manuscript is not persuasive in this regard (appears to be minimal).

      (a) In Figure 5C, the differences between melanoma cells in a control background versus those in a Twist-expressing keratinocyte background include cells from more than one unique cluster, but most of the different clusters are not discussed, except for one prominent cluster indicated by an arrow.

      The reason we pointed out that one cluster is that it was the major thing that was different in the control melanomas vs. the TWIST melanomas. To better clarify this point, we have made a new Supplemental Figure 3 comparing the clusters in each situation: 7 in the control melanomas vs. 8 in the TWIST melanomas (Supp. Figure 3d). To then better understand the nature of the TWIST melanomas, we performed Gene Set Enrichment Analysis (GSEA) compared to the control melanomas. Interestingly, this revealed a striking enrichment for pathways related to oxidative phosphorylation using both GO and Hallmark terms. Because we had previously shown that melanoma cells with high ox-phos are typically in the more melanocytic and less invasive state (Lumaquin-Yin, Nature Communications 2023), we therefore analyzed our TWIST melanomas by comparing this unique cluster to the well-annotated melanoma cell state signatures from Tsoi et al (Cancer Cell, 2018). This showed that most of the TAKs and TWIST-KCs were in the melanocytic/transitory cluster, which are thought to be the least invasive of all the melanoma cell states. Thus, it seems likely that high levels of TWIST in the keratinocytes induces a low invasion state in the melanoma cells. We have added this data and interpretation to the Results and Discussion sections of the manuscript.

      (b) In Figure 5D, it is unclear whether TAKs include both wild-type keratinocytes and Twist-expressing keratinocytes. 

      We oversimplified this plot for the sake of visualization, but realize that in doing so we obscured some important details. In the plot, we separate normal keratinocytes (NKCs) vs. tumor associated keratinocytes (TAKs). TAKs are, by definition, TWIST<sup>hi</sup>/EMT<sup>hi</sup> and represent upregulation of endogenous TWIST. In contrast, when we force overexpression of TWIST in the keratinocytes, then we see an entirely new cluster appear, as expected. 

      (c) In Figure 5F, TAKs are interacting with melanoma cells so it is unclear why the CellChat analysis did not include TAKs. 

      This was an oversight on our part, and the Figure has now been corrected to include this. TAKs in both the control and TWIST melanomas have numerous interaction partners, whereas the TWIST-KCs have relatively fewer and more specific interactions.

      (d) Finally, Figure 5G needs clearer labelling,currently unclear which gene is expressed by the sender and which is by the receiver.

      This has been clarified in Figure 5F with specific indicators of “sender” vs. “receiver”.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1E - in this figure, it is possible to observe the altered morphology of keratinocytes but these cells are not in the vicinity of the melanoma cells - can authors please make a zoom-in in the region of the interface? And quantify the distance between cells - at least the image they show looks like the cells that are mostly de-formed are far away from the melanoma but perhaps was just this example....please clarify. Or there are patches of keratinocytes that go through EMT and others that maintain their epithelial structure?

      We have now added zoom-in images of the interface (Figure 1E). In nearly all sections examined, some keratinocytes maintain their hexagonal normal epithelial structure, but the majority of the cells appear altered. We have attempted to quantify this effect, along with the distance between cells with this EMT-like morphology, but have not found a reliable method given the heterogeneity across samples. That is why we instead chose to quantify the EMT-like keratinocytes (what we refer to as TAKs) using single-cell RNA seq, which showed that 32% of the population had the TAK signature, whereas 68% resembled normal keratinocytes. We feel this is more quantitative than imaging alone.

      This data has been added to the Results section.

      (2) Figure 3B - could not find the number of fish analyzed.

      This was an oversight on our part. We studied n=135 control melanomas vs. n=118

      TWIST melanomas. This data has now been added to Figure 3B.

      (3) Figure 3D - missing a graph with quantification and zoom images in the tail keratinocytes/ melanoma interface.

      In this particular cohort of animals, we unfortunately did not specifically track body vs. fin melanomas, so we are not able to quantify this.

      (4) Figure 4 - it would be nice again to have a zoom-in to observe the interface of cells- maybe use a phalloidin staining to visualize better how cells are touching each other.

      We have added a zoom in image of the interface to the image (Figure 4E). We have very much wanted to do immunohistochemistry (not just for phalloidin, but for other markers as well) on these coverslip co-cultures and have tried, but we have not been successful. This is likely because the assay requires plastic plates, which are incompatible with doing this, but agree that getting this to work would be an important area for future development.

      (5) I believe the paper deserves a last figure - with the model.

      We agree and this has now been added as Figure 7.

    1. eLife Assessment

      This important work advances our understanding of the single neuron coding types in the mouse gustatory cortex and the functional roles of these neurons for perceptual decision-making. The conclusions are based on compelling evidence from rigorous behavioral experiments, high-density electrophysiology, sophisticated data analysis, and neural network modeling with in silico perturbations of functionally-identified units. This work will be of broad interest to systems neuroscientists.

    2. Reviewer #1 (Public review):

      The manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One minor weakness is the mismatch between the neural analyses and behavioral data. Neural analyses (i.e. population activity trajectories) indicate a separation of the neural activity associated with each mixture. Given this analysis, one might expect the psychometric curve to have a significantly steeper slope. One potential explanation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity matched concentrations. In this case, a single stimulus can (theoretically) dominant the perception of a mixture resulting in a biased behavioral response despite accurate concentration coding. Given the difficulty of iso-intensity matching concentrations, this concern is not paramount.

    3. Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I initially had some suggestions for further analyses to clarify the contribution of constrained and unconstrained units. In the revised version, the authors have performed all the suggested analyses, further strengthening their conclusions.

    4. Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision making tasks, reflecting both sensory and decision variables. In the present study, Lang et al., set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods with regard to the behavioral task and electrophysiological recordings/data analysis are straightforward, solid and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced choice task that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.

      By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small populations of neurons with specific tuning profiles were sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.

      These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This manuscript provides several important findings that advance our current knowledge about the function of the gustatory cortex (GC). The authors used high-density electrophysiology to record neural activity during a sucrose/NaCl mixture discrimination task. They observed population-based activity capable of representing different mixtures in a linear fashion during the initial stimulus sampling period, as well as representing the behavioral decision (i.e., lick left or right) at a later time point. Analyzing this data at the single neuron level, they observed functional subpopulations capable of encoding the specific mixture (e.g., 45/55), tastant (e.g., sucrose), and behavioral choice (e.g., lick left). To test the functional consequences of these subpopulations, they built a recurrent neural network model in order to "silence" specific functional subpopulations of GC neurons. The virtual ablation of these functional subpopulations altered virtual behavioral performance in a manner predicted by the subpopulation's presumed contribution.

      Strengths:

      Building a recurrent neural network model of the gustatory cortex allows the impact of the temporal sequence of functionally identifiable populations of neurons to be tested in a manner not otherwise possible. Specifically, the author's model links neural activity at the single neuron and population level with perceptual ability. The electrophysiology methods and analyses used to shape the network model are appropriate. Overall, the conclusions of the manuscript are well supported.

      Weaknesses:

      One potential concern is the apparent mismatch between the neural and behavioral data. Neural analyses indicate a clear separation of the activity associated with each mixture that is independent of the animal's ultimate choice. This would seemingly indicate that the animals are making errors despite correctly encoding the stimulus. Based solely on the neural data, one would expect the psychometric curve to be more "step-like" with a significantly steeper slope. One potential explanation for this observation is the concentration of the stimuli utilized in the mixture discrimination task. The authors utilize equivalent concentrations, rather than intensity-matched concentrations. In this case, a single stimulus can (theoretically) dominate the perception of a mixture, resulting in a biased behavioral response despite accurate concentration coding at the single neuron level. Given the difficulty of isointensity matching concentrations, this concern is not paramount. However, the apparent mismatch between the neural and behavioral data should be acknowledged/addressed in the text.

      We thank the Reviewer for the insightful comments and thoughtful suggestions. Our electrophysiological recordings show that GC dynamically encodes stimulus concentration of mixture elements, dominant perceptual quality, and decisions of directional lick. With regard to the encoding of mixtures, the clear separation of activity associated with each mixture (Figure 3) is present at a trial-averaged pseudo-population level, and average activities associated with more similar, intermediate mixtures are closer to each other in this space. At a single trial level activities evoked by similar, intermediate mixtures are much harder to separate. This increased similarity can lead to behavioral errors resulting from either incorrect encoding of the stimulus or from the inability to interpret the stimulus to guide the correct decision. The psychometric function, which shows that more distinct stimuli (100/0 vs 0/100) lead to fewer mistakes than more ambiguous, intermediate mixtures (55/45 vs 55/45), is consistent with the increased ambiguity of responses to intermediate mixtures.

      The Reviewer is correct that there could be a slight mismatch in the perceived intensity of the mixture components. This mismatch could be the reason for the slight asymmetry in our psychometric function (Figure 1B). However, it is not uncommon for mice in these 2AC tasks to also have a motor laterality bias in their responses that manifests itself for the more ambiguous stimuli. We chose not to model this bias given its subtlety and its unknown origin. Rather, we chose to model an ideal scenario in which stimuli have matched intensity and no motor bias exists. In the revised manuscript we discuss this issue.

      Reviewer #1 (Recommendations for the authors):

      (1) The apparent mismatch between neural and behavioral data. I am providing more details in this section to hopefully better illustrate my concern.

      (a) Based on the author's psychometric curve, sucrose appears to be a more salient signal causing the behavior to be shifted (e.g., a 50/50 mixture results in a >60% predicted behavioral performance). If both sucrose and salt were intensity-matched, a 50/50 mixture should result in a behavioral performance near 50%. The increased salience of sucrose could cause the animals to have lower overall performance despite accurate neural encoding. Alternatively, certain animals could display a strong side bias, skewing the data slightly. These issues have seemingly been fixed in the model data, which displays a more balanced psychometric curve. Accordingly, the model data seemingly displays a larger shift in error trials as compared to correct trials (Figure 6A).

      The reviewer is correct in observing that the average experimental psychometric curve in Figure 1B shows a slight shift in favor of the sucrose side with a 50/50 mixture. We fit psychometric curves to each session and the mean value of P(Sucrose choice | Stimulus = 50/50) across sessions was significantly different from 0.5 (one-sample t-test, p = 0.003), with 5 probabilities below 0.5 and 18 above it.

      This slight bias could be attributed to a slight mismatch in the perceived intensity of the mixture components and/or lateral motor biases. In any case, it is subtle and its origins were not a focus of this study.

      Models were not trained to match the animals’ psychometric curves, but rather to choose correctly in an ideal scenario where stimuli have matched intensities. This explains why the model simulations lack the bias observed in animal behavior data.

      We do not believe that there is a mismatch between the experimental behavioral and neural data, as trial-averaged pseudo-population trajectories are farther in neural space for more discriminable stimuli and closer in neural space for more similar stimuli, consistent with behavioral performance that is high for more discriminable stimuli and low for more similar stimuli. Moreover, as the model also shows, a clear separation of trial-averaged trajectories still results in a sigmoidal performance function for trial-to-trial behavior.

      Finally, subtle behavioral biases would not necessarily be expected to appear in our dPCA analyses since we used this technique to find a single axis that best separates all stimuli conditions regardless of choice when the pseudo-population data are projected upon it. Additional modes of activity that explain less overall variance might better reflect biases.

      (b) Although I am not an expert at these analyses, I wonder whether the elevated bump (i.e., >0) in Figure 3C of the 55/45 mixture that occurs early in the stimulus presentation further supports the hypothesis mentioned above and could indicate an early signal of salience/increased intensity?

      The reviewer is correct that the 55/45 trajectory features a brief positive wave right after stimulus delivery before going negative. While this may be related to stimuli not being explicitly balanced for intensity, it could also reflect a signal related to ambiguity or balanced mixtures. We are hesitant to interpret this positive deflection as conclusive evidence of a bias in neural activity, given its short duration and the natural variability of neural signals.

      (2) The increase in step-perception neurons after the decision period is confusing (Figure 4C). The text states (line 246) "the analysis reveals a small and time-invariant proportion of step-perception neurons". However, the proportion doubles after the decision-making process, which is seemingly a significant change. Why does this occur? This observation is noticeably missing from the network data. Could it be attributed to a mislabeling of "step-choice" neurons, given the correlation between the left/right decision and sweet/salty? Either way, it is very noticeable and should be addressed.

      We cannot be sure of the reason for the increase in step-perception neurons after decisions. One possibility is that they are acting as feedback for learning, encoding the percept to compare with choice and outcome to improve performance. The model, which presumably learns the task differently from the animals, does not seem to leverage this signal for its own learning. We have modified the text, now referring to a “small but consistently present proportion” of step-perception neurons, and included this proposed explanation in the Discussion.

      (3) Optional: I think the authors are missing an opportunity to analyze the temporal aspect of this multiplex code using their network-based modeling approach. A significant proportion of neurons fall into different categories (i.e., step-perception/linear, etc.) at different time points. However, the virtual ablation experiments remove any neuron that falls into one of these categories at any time. By limiting the cell-specific virtual ablation to specific time windows, you could (I think) provide stronger evidence for the temporal sequence of the encoding of these perceptual aspects.

      This was an excellent suggestion for an additional modeling experiment, so we performed it. A new supplemental figure (Figure S8) and additional text in the revised manuscript showcase the results. In summary:

      In terms of behavioral results, ablating the linear coding units in the beginning (that is, silencing all units that are labeled linear in any bin within the first 1.2 s after stimulus onset for the entirety of the 1.2 s) significantly reduces performance, as does ablating the step-perception or step-choice coding units at the end (1.2 s prior to choice). The remaining combinations of coding type and timing of the ablation do not affect performance.

      Regarding the dynamics of coding types (compare Figure 7A), stimulus coding activity was significantly blunted only by ablating the linear coding units in the beginning, whereas choice coding activity was diminished by ablating the choice coding units at the end or by ablating the linear coding units at either the beginning or the end.

      Reviewer #2 (Public review):

      Lang et al. investigate the contribution of individual neuronal encoding of specific task features to population dynamics and behavior. Using a taste-based decision-making behavioral task with electrophysiology from the mouse gustatory cortex and computational modeling, the authors reveal that neurons encoding sensory, perceptual, and decision-related information with linear and categorical patterns are essential for driving neural population dynamics and behavioral performance. Their findings suggest that individual linear and categorical coding units have a significant role in cortical dynamics and perceptual decision-making behavior.

      Overall, the experimental and analytical work is of very high quality, and the findings are of great interest to the taste coding field, as well as to the broader systems neuroscience field.

      I have a couple of suggestions to further enhance the authors' important conclusions:

      My main comment is the distinction between constrained and unconstrained units. The authors train a small percentage of units to match the real neural data (constrained units), and then find some unconstrained units that are similar to the real neural data and some that are not. As far as I could tell, the relative fraction of constrained and unconstrained units in the trained RNN is not reported; I assume the constrained ones are a much smaller population, but this is unclear. The selection of different groups of neurons for the RNN ablation experiments appears to be based on their response profiles only. Therefore, if I understood correctly, both constrained and unconstrained units are ablated together for a given response category (e.g., linear or step-perception). It would be useful, therefore, to separately compare the effects of constrained vs. unconstrained RNN units.

      We thank the Reviewer for the constructive feedback. The Reviewer is correct that ablations were carried out with respect to response categories only and included both constrained and unconstrained units.

      The ratio of total units to constrained units was fixed at 5.88, thus constrained units were ~17% of the network and unconstrained units were ~83%. This value is specified in the Methods (RNN: Components and dynamics), but we have reported it in the Results of the revised manuscript for clarity.

      We have also edited the Methods because they wrongly stated that the ratio of unconstrained (rather than total) units to constrained units was 5.88.

      Specifically:

      (1) For the analyses in the initial version of the manuscript, the authors should specify how many units in each ablation category are constrained and unconstrained.

      In the revised manuscript, we have specified the fractions of constrained and unconstrained units within each response category. For convenience, they are reported here: linear = 194 constrained and 691 unconstrained units; step-perception = 147 constrained and 840 unconstrained units; step-choice = 129 constrained and 814 unconstrained units; “other” = 353 constrained and 1739 unconstrained units.

      (2) The authors should repeat Figure 6, but only for unconstrained units to test how much of the effects in the initial version of Figure 6 are driven by constrained vs. unconstrained RNN units.

      In the revised version we have included two additional supplemental figures (Figures S5-6) where the analyses of Figure 6 are carried out separately for constrained and unconstrained units. In short, the results for the constrained units strongly resemble those for the experimental data, while the results for the unconstrained units strongly resemble those for all model units.

      (3) The authors should repeat Figure 7, but performing ablations separately on the constrained and unconstrained units to examine how the network behaves in each case and the resulting "behavioral" effect.

      The revised version includes a supplemental figure (Figure S7) with the results of these additional ablation simulations.

      In summary:

      In terms of behavioral performance, the prior results showing that ablating linear, step-perception, or step-choice units significantly impairs performance, while ablating “other” has no significant effect, hold even if ablation is restricted to only constrained or only unconstrained units. There is a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs performance more, most likely due to their larger population size.

      In terms of dynamics, to impair stimulus coding by ablating step-choice units, you must ablate them all; to impair stimulus coding by ablating linear or step-perception units, however, ablating just the unconstrained ones suffices. As before, ablating linear, step-perception, or step-choice units significantly impairs choice coding activity, while ablating “other” units does not; these results hold even if ablation is restricted to only constrained or only unconstrained units. Finally, there is again a significant main effect of constrained vs unconstrained; on average, ablating the unconstrained population impairs dynamics more, most likely due to the larger population size.

      Reviewer #2 (Recommendations for the authors):

      (1) In addition to panel 5B, it would be informative to show data from individual mice and the corresponding RNNs trained on each mouse, to assess how closely they match. If available, including one representative example of a good match and one of a less accurate match would help the reader get a better sense of the data.

      Figure 5B shows the average behavioral performance of the model. Individual models were not trained directly on the psychometric curves of experimental sessions; they were trained to perform the task correctly. After successful training, model simulations were run with input noise to be able to produce a sigmoidal psychometric curve. However, although the input noise was tuned to capture the overall correct rate of the corresponding experimental session, we did not attempt to match the details of the psychometric curve. See also the next reply.

      (2) In addition to panel 5C, it would be useful to add examples of experimentally observed PSTHs and the corresponding activity trajectory for the units in the RNN trained to match them, for all the other coding patterns (step-perception and step-choice).

      We note that the PSTH in 5C is not an example of a linear coding unit as the Reviewer implies, but simply one with a good fit, and here the model’s output was produced in the absence of input noise. In order to classify step-perception and step-choice responses one needs error trials, but the model was trained without this input noise that induces errors (and produces a sigmoidal psychometric function) to match experimental PSTHs from correct trials only. Post-training simulations were then run with input noise to induce error trials, and model unit response profiles were classified based on this. However, there is no guarantee that error trials in the model match the error trials in the experiment; therefore, step-perception and step-choice units in the model may or may not be step-perception and step-choice units in the data. Despite this limitation, the revised manuscript includes additional examples, in Figure S2, of experimentally observed PSTHs and their corresponding model activity, to supplement Figure 5C and provide a better sense of the goodness-of-fit.

      (3) Electrophysiological data in Figure 2 - It would be helpful to provide statistics on how many neurons change their activity in each session.

      In the revised manuscript we have included across-session statistics for proportions of neurons that are taste-responsive and that show decision preparatory activity. We have also included tables (Tables S1 and S3) with the numbers of neurons that are taste-responsive and that show preparatory activity for each session in the experimental and model data.

      (4) Peak auROC selection - How was the peak auROC selected? Selecting only one bin for the peak could be potentially problematic and may result in the incorrect identification of an outlier that does not faithfully represent the neuron's overall activity. The peak selection could instead be based on several consecutive bins showing a consistent trend. If this approach was already implemented, the authors should explicitly describe it in the Methods section.

      Peak auROC was selected from a single bin (with average duration about 50ms). While it is true that this may result in outlier neurons that transiently prefer one stimulus strongly but more consistently prefer the other, we opted for a simple criterion to sort the neurons into two categories for visualization. Adopting more stringent criteria that consider multiple bins may result in neurons that cannot be placed in either category, and we wanted a way to examine the entire pseudo-population. Also, the entire auROC trace is visualized in the heatmap, so potential outliers are not hidden and can be assessed by eye.

      Reviewer #3 (Public review):

      Primary taste cortex neurons show a variety of dynamic response profiles during taste decision-making tasks, reflecting both sensory and decision variables. In the present study, Lang et al. set out to determine how neurons with distinct response profiles contribute to perceptual decisions about taste stimuli.

      The methods, with reference to the behavioral task and electrophysiological recordings/data analysis, are straightforward, solid, and appropriate. The computational model is presented in a clear and conceptually intuitive manner, although the details are outside of my area of expertise.

      The experimental design features a simple 2-alternative forced-choice design that yielded clear psychometric curves across a range of stimuli. In vivo recordings were performed using Neuropixels and yielded an appropriate sample of single neuron responses. The strength of the model lies in the fact that it consists of single neurons whose response profiles mimic those recorded in vivo, and allows neuron-selective manipulation.

      By virtually lesioning specific subsets of neurons in the network, the authors demonstrate that a relatively small population of neurons with specific tuning profiles was sufficient to produce the observed neural dynamics and behavioral responses. This effect was selective as lesioning other responsive neurons did not affect overall response dynamics or performance.

      These findings provide new insight into the relation between the response profiles of single neurons in sensory cortex, their population-level activity dynamics, and the perceptual decisions they inform.

      The approach is particularly innovative as it uses computational modeling to target functionally-defined "cell types", which cannot necessarily be targeted by more conventional genetic approaches.

      We thank the Reviewer for the positive assessment of our study.

      Reviewer #3 (Recommendations for the authors):

      (1) Introduction: I'm missing a clearly stated specific hypothesis and what is predicted on the basis of that hypothesis. What is the alternative?

      The null hypothesis is that single neuron activity patterns, even when clearly structured, do not matter for population activity or behavior. Alternatively, they do matter for these phenomena, and our model supports the alternative hypothesis. We have made this hypothesis clearer in the Introduction.

      (2) Discussion: Much of the text is a recap of the Introduction and Results sections. Please elaborate on the specific insights gained from the findings. The idea that tuned neurons in the sensory cortex are the basis for perception and perceptual decisions concerning the features being represented by those neurons is generally accepted. What the present study adds to this insight could be described more explicitly. On the other hand, the idea that small populations of tuned neurons are responsible for perception of taste/perceptual decisions about taste appears in contrast with previous accounts where stimulus features/decisions are reflected in correlated changes in activity across distributed populations of taste cortical neurons, including ones that are not necessarily tuned or even overtly responsive. How do the present findings relate to this idea?

      This is a very good point about reconciling these findings with past ones that have focused on coordinated changes across ensembles of neurons, i.e., metastable dynamics of internal (hidden) states. There is a brief mention of metastability toward the end of the Discussion, but we agree it deserves elaboration.

      This work does emphasize single unit activity, but in the context of, and as relevant to, population activity. We believe that the findings and frameworks of previous studies and those presented here are compatible rather than mutually exclusive. There is no reason why neurons with the coding patterns we studied here cannot coordinate with others to participate in the formation of different metastable states. The question of which—neurons with specific response profiles, or ensemble activity patterns that may involve these neurons?—is necessary and sufficient for producing perception and behavior during the mixture-based decision-making task is interesting but rather difficult to answer because of the single units’ contribution to both alternatives. One would need to utilize a manipulation that disrupts ensemble coordination without disrupting single unit activity to differentiate between them. We have made these points clearer in the Discussion.

      (3) Results: RNNs were based on data from single sessions -- how many neurons of each tuning type were observed in each session? In particular, there were 23 sessions but only 25 neurons total tuned to choice, suggesting that modelled choice neurons were based on ~1 neuron.

      The revised manuscript includes the session-by-session breakdown of response types for both experiment and model in two supplementary tables (Tables S2 and S4). We note that there are 25 neurons tuned to choice during the last 500 ms of the trial prior to decision, but 114 out of 626 neurons in total are tuned to choice in some time bin in the experimental data.

      (4) Minor: Indicate the time windows used for analysis of stimulus sampling, delay, and choice on the figures.

      The revised manuscript now includes the illustration of sampling and delay windows in Figure 2C-D, since we averaged the values over these windows for use in a 2-way ANOVA. All other figures either are associated with bin-by-bin analyses and have the first central and lateral licks (T and D) indicated, or have the time windows specified (e.g., Figure 4B, which uses [T, T + 0.5 s] and [D - 0.5 s, D]).

    1. eLife Assessment

      This study presents valuable findings on the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence. The evidence supporting the claims of the authors is solid, although a more exhaustive characterisation of how the different signals interact would have strengthened the study. The work will be of interest to cognitive and systems neuroscientists working on decision-making.

    2. Reviewer #1 (Public review):

      Summary:

      This paper characterises the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence, with a focus on the centroparietal positivity and motor beta lateralization. The main finding is that the centroparietal positivity builds up during evidence accumulation but falls back to baseline during gaps, while motor beta lateralization maintains a continuous a sustained representation throughout the gap and until response.

      Strengths:

      - Elegant combination of electroencephalography and computational modelling.

      - Innovative task design, including parametric manipulation of gap duration.

      - The authors describe results of two separate experiments, with very similar results, in effect providing an internal replication.

      Weaknesses:

      - A direct characterization of how the centroparietal positivity and motor beta lateralization interact is missing, which limits the novelty. In their reply to reviewers, the authors argue that the signal-to-noise ratio of EEG signals is insufficient for such analyses at the single-trial level. If so, a binned or trial-averaged approach could still be attempted.

      - An exhaustive characterisation of sensors and frequency bands is also missing. In their reply to reviewers, the authors suggest that this would detract from their hypothesis-driven focus. I disagree: the main hypothesis and figures could remain centred on the centroparietal positivity and motor beta lateralization, with a more comprehensive mapping of sensors and frequencies placed in supplementary material. Since the purpose of the paper is to examine EEG-based decision signals in a novel behavioural context, a broader characterisation of the underlying EEG landscape would seem appropriate.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript examines decision-making in a context where the information for the decision is not continuous, but separated by a short temporal gap. The authors use a standard motion direction discrimination task over two discrete dot motion pulses (but unlike previous experiments, fill the gaps in evidence with 0-coherence random dot motion of differently coloured dots). Previous studies using this task (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023) or other discrete sample stimuli (Cheadle et al., 2014; Wyart et al., 2015; Golmohamadian et al., 2025) have shown decision-makers to integrate evidence from multiple samples (although with some flexible weighting on each sample). In this experiment, decision-makers tended not to use the second motion pulse for their decision. This allows the separation of neural signatures of momentary decision-evidence samples from the accumulated decision-evidence. In this context, classic electroencephalography signatures of accumulated decision-evidence (central-parietal positivity) are shown to reflect the momentary decision-evidence samples.

      Strengths:

      The authors present an excellent analysis of the data in support of their findings. In terms of proportion correct, participants show poorer performance than predicted if assuming both evidence samples were integrated perfectly. A regression analysis suggested a weaker weight on the second pulse, and in line with this, the authors show an effect of the order of pulse strength that is reversed compared to previous studies: A stronger second pulse resulted in worse performance than a stronger first pulse (this is in line with the visual condition reported in Golmohamadian et al., 2025). The authors also show smaller changes in electrophysiological signatures of decision-making (central parietal positivity, and lateralised motor beta power) in response to the second pulse. The authors describe these findings with a computational model which allows for early decision-commitment, meaning the second pulse is ignored on the majority of trials. The model-predicted electrophysiological components describe the data well. In particular, this analysis of model-predicted electrophysiology is impressive in providing simple and clear predictions for understanding the data.

      Weaknesses:

      Some readers may be left questioning why behaviour in this experiment is so different from previous experiments which use almost exactly the same design (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023). Overall performance in this experiment was much worse than previous experiments: Participants achieved ~85% correct following 400 ms of 33 - 45% coherent motion. In previous work, performance was ~90% correct following 240ms of 12.8% coherent motion. A second weakness is that, while the authors present a model which describes the data based on pre-mature decision-commitment, they do not examine explanations from the existing literature, that evidence is flexibly weighted, and do not provide any analyses which could be used to compare these descriptions. While their model can describe the data in this manuscript, it cannot explain the data from previous experiments showing a stronger weight on the second pulse.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper aims to characterise the physiological and computational underpinnings of the accumulation of intermittent glimpses of sensory evidence.

      Strengths:

      (1) Elegant combination of electroencephalography and computational modelling.

      (2) The authors describe results of two separate experiments, with very similar results, in effect providing an internal replication.

      (3) Innovative task design, including different gap durations.

      Weaknesses:

      (1) The authors introduce the CPP as tracking an intermediary (motor-independent) evidence integration process, and the MBL as motor preparation that maintains a sustained representation of the decision variable. It would help if the authors could more directly and quantitatively assess whether their current data are in line with this. That is, do these signals exhibit key features of evidence accumulation (slope proportional to evidence strength, terminating at a common amplitude that reflects the bound)? Additionally, plotting these signals report locked (to the button press) would help here. What do the results mean for the narrative of this paper?

      The reviewer is correct that properties such as temporal slope scaling with evidence strength and stereotyped threshold-like amplitude were key in establishing that the CPP reflects evidence accumulation in conventional continuous-stimulus tasks, and its motor independence was demonstrated in how it exhibited the same evidence-dependent dynamics in the absence of motor requirements (e.g. O'Connell et al 2012). We agree that it is of interest to check any such properties that can be feasibly tested in the current, distinct task context of intermittent evidence with delayed responses. Given the way in which participants performed our delayed-response task, sometimes terminating decisions early, it is in the CPP-P1 that conventional patterns of coherence-dependence in slope and amplitude would be expected. Indeed, we found that the CPP-P1 reached higher amplitudes (Fig. 3A, Author response image 1) and exhibited a steeper build up in high- compared to low-coherence trials (Author response image 1). The slope and amplitude profile of the CPP-P2 is complex due to the variability in baseline activity across our various delay conditions and the bounded process that participants engaged in, but it is still consistent with an accumulation process. Our simulations provide a full account of how an accumulating signal could produce the observed results.

      Author response image 1.

      Grand-averaged (± sem) CPP-P1 traces in both experiments (top). Bottom boxplot graphs indicate the average slope computed as the slope between 0.2 s post P1 onset (when CPP begins its buildup) and the time when peak amplitude was reached within the [0.4-0.6s] interval, computed for each subject individually. Red crosses indicate outliers, computed as values exceeding 1.5 times the interquartile range away from the bottom or top of the box. Grey lines indicate single subject estimates, and asterisks reflect the significance of paired ttests for the estimated slope and amplitude effects; **p<0.01, *p<0.05. H = high coherence, L = low coherence.

      Like in other delayed-response tasks (Twomey et al 2016; McCone et al 2026), we observe here that the CPP peaks and falls well before the response is cued or indeed executed (here, in fact peaking and falling for each individual pulse). Thus, its pre-response dynamics will not relate to stimulus-driven evidence accumulation in the way they do in immediate response contexts (e.g. O’Connell et al. 2012; Steinemann et al. 2018). We therefore do not analyse response-aligned CPPs in the experiment.

      As to the intermediary role we have interpreted for the CPP, in addition to the local pulse driven peak-and-fall dynamics compared to the sustained profiles of motor preparation signals, we can point to the obvious temporal delay between the signals, where evidence-dependent buildup in the CPP substantially precedes that of motor preparation, as observed in all previous studies comparing the two (e.g. Kelly & O'Connell 2013).

      (2) The novelty of this work lies partly in the aim to characterize how the CPP and MBL interact (page 5, line 3-5). However, this analysis seems to be missing. E.g., at the single-trial level, do relatively strong CPP pulses predict faster/larger MBL? The simulations in Figure 5 are interesting, but more could be done with the measured physiology.

      As exemplified in the extant EEG-decision literature, the low signal-to-noise ratio of EEG is such that attempts are seldom made to link two EEG signals on a single-trial basis, and studies instead favour testing single-trial relationships between each individual EEG signal and behaviour, or, most commonly, comparing patterns of variation in the EEG signals across experimental conditions (e.g. difficulty). Accordingly, here we show that trials with high coherence P1 evoked 1) higher CPP amplitudes (Fig. 3A,C), and 2) stronger MBL (Fig. S2 & S3). Further, we showed that particularly high CPP amplitudes following the first pulse led to stronger weights on choice for the first pulse (Fig. S11), which could only be mediated by the motor system.

      (3) The focus on CPP and MBL is hypothesis-driven but also narrow. Since we know only a little about the physiology during this "gaps" task, have the authors considered computing TFRs from different sensor groupings (perhaps in a supplementary figure?).

      While we agree that it might be interesting to explore frequency bands and sensors more broadly, we feel that such an exploration would detract from the hypothesis-driven focus on how prominent, well-characterised decision signals in the brain behave in a context where evidence is presented in an atypical, seldom-studied manner, namely in the form of temporally separate pulses. Our aim was not to explore whole-brain dynamics that might be engaged during the task, but rather to get a better understanding of the functional roles of the neural processes underlying the CPP and MBL during decision making. Providing a detailed description of whole-scalp responses is thus beyond the scope of this paper, but given that all data will be made publicly available this can be pursued in future work and by other researchers.

      (4) The idea of a potential bound crossing during P1 is elegant, albeit a little simplistic. I wonder if the authors could more directly show a physiological signature of this. For example, by focusing on the MBL or occipital alpha split by the LL, LH, HL and HH conditions, and showing this pulse- as well as report-locked. Related, a primacy effect can also be achieved by modelling (i) self-excitation of the current one-dimensional accumulator, or (ii) two competing accumulators that produce winner-take-all dynamics. Is it possible to distinguish between these models, either with formal model comparison or with diagnostic physiological signatures?

      In addition to the CPP amplitude effects we report in the main paper, the reviewer is correct that pulse-locked MBL can also provide a physiological signature of the greater number of pulse-1 bound crossings when that pulse is high-coherence. This is shown in Figure S3, where we see this coherence-dependent effect consistently across all gap durations and both experiments. Figure S2 also shows that the MBL step-change after P2 is greater in P1-low coherence trials in Experiment 1, as predicted by the bound-crossing account, and consistent with the CPP findings. We note that this effect appears absent in Experiment 2, but this is likely because the greater proportion of shorter gap durations (0, .12, .36s) mean that updates following P2 are likely to still capture P1-driven changes, due to signal-transmission delays. Please also note that Fig. S2 and S3 have been updated from the previous version, because while revising the paper we noticed a mistake whereby we were plotting alpha band power (813Hz) rather than the intended beta (13-30Hz). The results remain qualitatively unchanged. Although there isn’t sufficient single-trial signal-to-noise ratio to be able to categorise individual trials as having crossed a threshold or not, this is strong evidence in support of the coherence dependent amplitudes of the CPP and motor updates. Analyzing beta locked to the report would not be informative in this case because of the delayed reporting structure of the task and the threshold-crossing relationship beta exhibits with response execution (O’Connell et al. 2012). That is, beta will reach the same amplitude immediately prior to the response regardless of whether or not decisions were terminated during P1. Instead, we believe that the empirical CPP-P2 traces we show provide direct evidence that the second pulse was not fully integrated in all trials, and as our modelling confirms, this is consistent with bound crossings occurring sometimes before P2. First, the fact that CPP-P2 amplitudes were overall lower than CPP-P1 amplitudes mirrors the behavioural observation that the first pulse had a stronger weight on choice than the second one. Second, we show that trials where the CPP was particularly high after the first pulse were also trials where P1 also exerted a particularly strong influence on choice (see Fig. S11), further validating the idea that higher CPP amplitudes are directly related to behaviour.

      Regarding self-excitation (SE) and winner-take-all competition (WTAC), these could indeed contribute to the behavioural primacy effects, but they would not detract from our central finding that the CPP does not encode a sustained representation of a decision variable, but rather reflects two rounds of evidence accumulation feeding into a single decision process. Further, it is not immediately clear whether/how these alternative models might also account for the CPP-P1/CPP-P2 results as simply as our bounded model does. While it might be theoretically possible for SE/WTAC models to explain 1) why the CPP-P2 is generally lower than the CPP-P1 across conditions, and 2) why the maximum CPP-P2 amplitudes in P1-high trials are smaller than in P1-low trials, these patterns of results are not an immediate consequence of standard implementations. Further, while the question of whether the accumulation process is perfect integration or involves SE or WTAC is certainly of additional interest, given that this is a delayed response task and does not provide information on termination timing through RT distributions, arbitrating between these modes of integration would not be straightforward with the current data.

      (5) The way the authors specify the random effects of the structure of their mixed linear models should be specified in more detail. Now, they write: "Where possible, we included all main effects of interest as random effects to control for interindividual variability." This sounds as if they started with a model with a full random effect structure and dropped random components when the model would not converge. This might not be sufficiently principled, as random components could be dropped in many different orders and would affect the results. Do all main results hold when using classical random effects statistics on subject-wise regression coefficients?

      The equations in the paper include the full details of the random effects structure we used for each model. We note that only two of our four equations did not include a full random effect structure, indeed due to convergence issues. We have now fit these models with a maximal random effects structure (i.e. including all fixed effects as random effects as well) with the ‘bobyqa’ optimiser. This resulted in singular fits for both Eq. 2 (Exp. 1 and Exp. 2) and Eq. 3 (Exp. 2 only). Following previous suggestions, we used a weakly informative wishart prior (Chung et al. 2015) to regularise the random effects covariance matrix using the blme package (Chung et al. 2013), which resolved the singular fit problem. However, the model still produced convergence warnings in some models. To assess these models’ robustness, we compared the fixed effect parameter estimates across multiple optimisers, as suggested by the lme4 developers (see lm4 documentation). Parameter estimates across optimisers rarely deviated by more than one decimal point across 6 optimisers (see Bates et al. 2011), and we thus concluded the model estimates were robust and convergence warnings were a false positive, a known issue in lme4. For all models in the paper, we report the parameters estimated using the “bobyqa” optimiser. All main inferential results remain unchanged (except for one interaction that was not of interest in Exp. 1), and the estimated slopes and statistical results for all models have been updated in the manuscript. We also included all these details in the manuscript.

      Reviewer #2 (Public review):

      Summary:

      This manuscript examines decision-making in a context where the information for the decision is not continuous, but separated by a short temporal gap. The authors use a standard motion direction discrimination task over two discrete dot motion pulses (but unlike previous experiments, fill the gaps in evidence with 0-coherence random dot motion of differently coloured dots). Previous studies using this task (Kiani et al., 2013; Tohidi-Moghaddam et al., 2019; Azizi et al., 2021; 2023) or other discrete sample stimuli (Cheadle et al., 2014; Wyart et al., 2015; Golmohamadian et al., 2025) have shown decision-makers to integrate evidence from multiple samples (although with some flexible weighting on each sample). In this experiment, decision-makers tended not to use the second motion pulse for their decision. This allows the separation of neural signatures of momentary decision-evidence samples from the accumulated decision-evidence. In this context, classic electroencephalography signatures of accumulated decision-evidence (central-parietal positivity) are shown to reflect the momentary decision-evidence samples.

      Strengths:

      The authors present an excellent analysis of the data in support of their findings. In terms of proportion correct, participants show poorer performance than predicted if assuming both evidence samples were integrated perfectly. A regression analysis suggested a weaker weight on the second pulse, and in line with this, the authors show an effect of the order of pulse strength that is reversed compared to previous studies: A stronger second pulse resulted in worse performance than a stronger first pulse (this is in line with the visual condition reported in Golmohamadian et al., 2025). The authors also show smaller changes in electrophysiological signatures of decision-making (central parietal positivity and lateralised motor beta power) in response to the second pulse. The authors describe these findings with a computational model which allows for early decision-commitment, meaning the second pulse is ignored on the majority of trials. The model-predicted electrophysiological components describe the data well. In particular, this analysis of model-predicted electrophysiology is impressive in providing simple and clear predictions for understanding the data.

      Weaknesses:

      Some readers may be left questioning why behaviour in this experiment is so different from previous experiments, which use almost exactly the same design (Kiani et al., 2013; TohidiMoghaddam et al., 2019; Azizi et al., 2021; 2023). The authors suggest this may be due to the staircase procedure used to calibrate the coherence of (single-pulse) dot motion stimuli for individuals at the start of the experiment. But it remains unclear why overall performance in this experiment is so bad. Participants achieved ~85% correct following 400 ms of 33 - 45% coherent motion. In previous work, performance was ~90% correct following 240ms of 12.8% coherent motion. It seems odd that adding the 0% coherent motion in the temporal gaps would impair performance so greatly, given it was clearly colour-coded. There is a lack of detail about the stimulus presentation parameters to understand whether visual processing explains the declined performance, or if there is a more cognitive/motivational explanation.

      We thank the reviewer for highlighting this. We apologise for not providing full details about the visual display, which we have included now.

      The moving dots were presented centrally on the monitor, at a 5 degree aperture, and moving at a speed of 5 degrees/second. The monitor refresh rate was 60Hz for 19 participants and 85Hz for 3 participants in Experiment 1, while it was 85Hz for 19 participants and 60Hz for 2 participants in Experiment 2. Dot density in our task was similar to previous studies (16.7 dots/degree/s<sup>2</sup>, as in Kiani & Shadlen 2013; Tohidi-Moghaddam et al. 2019; Azizi et al. 2021, 2023). However, in contrast to previous studies, we did not include any feedback on a trial-bytrial basis, instead only providing feedback at the end of each block indicating the average accuracy. This would have made it harder for participants to continually assess how well they were performing and to adjust their strategies (e.g. increase their bound for better accuracy) accordingly. We agree that the inclusion of 0% coherence dots during the gap between pulses is unlikely to have caused the participants’ relatively low overall performance, especially since we did not find accuracy to be overall lower for longer 0%-coherence gaps.

      Further, as the reviewer notes, we used a staircasing procedure at the beginning of the experiment which used only single pulses of evidence. This may have encouraged participants to set a bound that can usually be reached by one pulse, and the resultant early terminations meant that they seldom used the full 400ms of evidence that were available to them. In fact, we would like to thank the reviewer for pointing out Golmohamadian et al., 2025, which used a similar variable delays task structure but with different visual stimuli. They, like us, trained on a single-pulse task version and omitted trial-by-trial feedback in the main task, and, also like us, reported a stronger choice reliance on pulse-1. This suggests that these two factors may suffice to induce a primacy rather than a recency effect.

      There are other reasons why performance may have been different in our task compared to previous studies. For example, our task included a lead-in period that was longer than in previous studies and contained 0%-coherence dots, in order to minimise interfering VEP components (the lead in period was between 700 to 1050ms in our study, compared to 200– 500 ms in Kiani & Shadlen 2013; Tohidi-Moghaddam et al. 2019 & Azizi et al. 2023, and 400 -1000 ms in Azizi & Ebrahimpour 2021). This longer and visually explicit preparation period may have acted as a warning cue, allowing participants to fully prepare before the first pulse, and again making it easier for them to hit a bound with only that information.

      We have added a more detailed discussion about how our stimuli and the task characteristics may have resulted in a substantially different performance in our task compared to previous studies in the discussion section.

      Recommendations for the authors:

      Reviewing Editor:

      Please consider the following reviewer suggestions for how to strengthen the evidence for your central claims, which could translate into an improved assessment of the "strength of evidence".

      Apart from these useful suggestions, I had some concerns about scholarship, because the list of studies currently cited in your introduction is exclusively from your group, while one of the phenomena of interest - motor beta power lateralization (MBL) in decision-making - has been widely studied by several groups, using also other techniques.

      I was wondering why you chose not to cite the ample MEG evidence for the role of MBL in decision-making. This has been shown both in classical random dot motion tasks (Donner et al, Curr Biol, 2009; de Lange et al, J Neurosci, 2013; Pape et al, Nat Commun, 2016; Urai et al, Nat Commun, 2022) as well as in tasks involving discrete evidence samples (Wilming et al, Nat Commun, 2020; Murphy et al, Nat Neurosci, 2021). Another relevant EEG study is by Ian Gould et al, J Neurosci, 2010. There is also quite a bit of monkey LFP work (mainly by Saskia Haegens) on choice-selective beta power in the motor system of the macaque, although the link to the lateralized beta power suppression in your work and the above human E/MEG studies remains a bit elusive. I feel it would be important to provide a more balanced reflection of the existing literature on this phenomenon.

      We thank the editor for this fair comment, and we apologise for having provided a too narrow, EEG-centric view of the literature, arising from our interest in the CPP component which hasn’t yet been characterised in MEG or LFPs. We have now substantially expanded the introduction to provide a more balanced and comprehensive overview of the literature.

      Reviewer #1 (Recommendations for the authors):

      (1) The diffusion model needs to be explained in more detail. For example, it should be explicitly stated that the model was fit to only choices, as most readers would expect reaction times. Further, it needs to be started if the model was fit separately for each subject or in one go to the group-level data. If the former, it is important to add error bars of the betweensubjects variability (in simulated and empirical data) to Figure 4A. If the latter, it would be important to determine uncertainty using bootstrapping.

      The original model was fit to grand-average data, as stated in the methods section. To assess between-subjects variability, we have re-fitted the model to each individual subject, for each experiment. The average of the individually-estimated model parameters closely recapitulated the values obtained from the fit to grand-averaged data (Fig. S12). We then simulated N = 10000 trials for each individual, and we report the grand-averaged results with error bars indicating the standard error of the mean as a supplementary figure (Fig. S13). The results replicate the ones reported in the main manuscript. We have also made it explicit that the models are fit to accuracy data but not RT.

      (2) The authors write numerous times that the MBL exhibits an "evidence-dependent" buildup. However, should this not be "choice-dependent"? In Figure 2A, one can clearly see that the sign of MBL follows choice and not objective evidence.

      We thank the reviewer for this comment. By evidence-dependent, we mean that lateralisation towards the correct response is strongest in high-coherence trials (see Fig. S2, S3). This is indeed because the sign of MBL is choice-dependent, and participants are less likely to make mistakes in high-coherence trials. We have added a clarification sentence in the text.

      (3) It would aid readability to add sub-conclusions at the end of each Results section.

      We have added clarifications where needed.

      (4) In Figure 1B, I cannot see a dashed line for the HL condition. I understand that it must lie under the LH condition, but it would be good to show it separately.

      We thank the reviewer for this comment. Since we cannot show both lines separately without additional panels, given the HL and LH lines perfectly overlap, we indicate at the end of the caption that this is the case as follows: “Note that a perfect accumulator predicts identical accuracies for the HL and LH conditions, and therefore the two lines overlap.”

      (5) In Figure 4B, is the horizontal dashed line important? It is confusing because the legend incorrectly states that this is "data".

      Thanks for this observation - it was only there to indicate a 50% as a benchmark to assess how frequent early terminations are, but we agree that it was unnecessary and potentially confusing, so we have removed it from the plot.

      Reviewer #2 (Recommendations for the authors):

      (1) The authors should more directly address how behaviour in their task differs quite substantially from previous experiments with very similar designs (including why such high coherence levels are required, over a longer duration, to reach overall worse performance). Some readers may also be interested in a broader discussion of how decision-makers may use flexible weights when integrating evidence across samples over time. While the explanation of bounded accumulation is convincing in this context, Tsetsos et al., (2012) suggest recency effects (as in Cheadle et al., 2014; Wyart et al., 2015) cannot be explained by bounded accumulation, but rather integration leak. Other factors may include stimulus consistency (Glickman et al., 2022) or even choice consistency across decisions (Bronfman et all., 2015). Golmohamadian et al., 2025 demonstrated flexibility in decision strategies across sensory modalities.

      As we described above, we have added some more detailed explanation about why it might be the case that behaviour in our study differs from previous reports using similar tasks. We agree that the reversed pulse-reliance in our study compared to others presents an opportunity to discuss flexibility in decision strategy and so we have now added a broader discussion on different patterns of integration in various task contexts. We thank the reviewer for pointing out Golmohamadian et al., 2025, as they, like us, trained on a single-pulse task version and omitted trial-by-trial feedback in the main task, and, like us, reported a stronger choice reliance on pulse-1.

      (2) Another open question is how central parietal positivity reflects an accumulation signal in the case of continuous evidence, but reflects momentary evidence in the case of discrete evidence samples. If, in both cases, the parietal evidence is passed along to motor processes for bounded decision commitment, how do motor processes deal with the changes in what is represented? Can the relationship between MBL and CPP in the model-simulated data shed some light on this? Specifically, how is the 0-gap condition treated in this simulation (which shows only 1 CPP peak but with a longer time to decay) compared to non-zero gap conditions (which show 2 peaks)?

      This is a very interesting and important point, and we thank the reviewer for raising it. We believe that the CPP in our intermittent-dots task reflects dot-motion evidence integration in the same way as in conventional continuous evidence tasks, building at an evidence dependent rate (see Author response image 1), with the only difference being that integration processes can be turned “on” or “off” depending on whether evidence is present, and can thus be temporally split into multiple “rounds” of accumulation when there is a gap.

      Our model simulations assume that evidence integration is triggered by the dots turning yellow, indicating the presence of evidence, and feeds continuously to the motor system in these periods. However, it is switched off either when 1) a bound has been hit, or 2) the dots turn blue again, at which point the CPP falls (see various rates of signal decay in Fig. S7). The reason the CPP continues longer before it peaks and falls in the zero-gap condition, by this account, is because there is no dot-colour change at the end of pulse-1 to switch it off, and thus the accumulation process continues until either a bound is hit, or the yellow dots turn blue after pulse-2. When there is a non-zero gap, despite the CPP being switched off, the decision variable itself remains encoded at the motor level so that no information is lost. This requires that the same instruction that turns-off the CPP must also break or pause the flow from the CPP to the motor level and allow it to hold its current level until either a second pulse resumes a feed from a newly-triggered CPP, or response execution is cued. Thus, in our account, the accumulation process underlying the CPP in our intermittent-evidence task is identical to conventional continuous-evidence tasks, but since it can be turned “on” and “off” as a function of whether or not evidence is clearly present or absent, produces two “rounds” of integration in non-zero gap conditions. The motor process also receives a feed from the CPP as in conventional continuous-evidence tasks, but with this feed similarly gated by the presence of evidence.

      A slightly different and perhaps more challenging question (which the reviewer was perhaps alluding to) relates to tasks where evidence comes not in short noisy snippets, but rather as static tokens (e.g. Wyart et al. 2012, 2015; Murphy et al. 2021; Parés-Pujolràs et al. 2025). In these instances, the CPP exhibits transient evoked responses to each token, which scale with the belief updates resulting from it (Parés-Pujolràs et al. 2025). However, it remains unclear whether these transient potentials reflect a temporally-evolving integration process to compute the appropriate belief update afforded by that token in the context of a particular task, or rather reflect the output of such a process. The former account would be similar to our interpretation of the transient deflections observed in this gaps task, which we believe capture the same temporal integration processes as those commonly observed in conventional continuous noisy stimuli paradigms, only short-lived. The latter account would instead be specific to low-noise stimuli like tokens, where the computations required for belief updating may not require a temporally-extended integration process, but rely on different mechanisms to compute belief updates (e.g. prior-based modulations of sensory encoding, attention or neural gain). These questions remain open for future investigation.

      (3) From what I understand, the model suggests all-or-none integration of the second pulse: either the bound has not been reached and the pulse is perfectly integrated, or the bound has been reached and so the pulse is not integrated. The CPP amplitude at pulse 2 is therefore determined not only by the strength of the evidence at pulse 2 but also by the proportion of trials where the evidence is not ignored: CPP at pulse 2 is of lower amplitude because it is calculated as an average across trials where it is either similar to CPP at pulse 1 or otherwise completely absent. Another explanation for the lower average amplitude is that all trials have a smaller amplitude (somewhat different from the main conclusions of the paper). It would be nice to show the dichotomy predicted by the model in the empirical data. I'm thinking of something similar to this 'bifurcation' analysis from Sergent et al., 2021. Or more simply, estimates of CPP amplitude from single trials (perhaps an average over a short window around the peak) should be more variable at pulse 2, with some reaching similar amplitudes to pulse 1, and many close to baseline, whereas at pulse 1, there should be a more uniform cluster of amplitudes. If all CPP peak amplitudes were lower, would this motivate a model comparison where, for example, additional evidence from the second pulse was down-weighted according to certainty following the first pulse (leading to all trials down-weighting the second pulse)? This could link in nicely with some of the more nuanced analyses related to attention in the supplementary figures.

      We thank the reviewer for this insightful comment, which will help us clarify how our model works. The integration of the second pulse does not work in an all-or-none manner. In our model, the accumulation stops whenever a bound is reached at the downstream motor level. This can happen 1) at some point during the 1st pulse (no integration of pulse 2 at all), 2) during the 2nd pulse (partial integration of pulse 2, until the bound is hit), or 3) not crossed at all (full integration of pulse 2). Our model thus allows for partial integration of the second pulse rather than all-or-none. Author response image 2 shows 3 example trials that illustrate how the model works. The CPP amplitudes at pulse 2 are thus determined by two main factors: 1) whether or not accumulation of P2 is precluded by an earlier bound crossing in P1 (if it is, the CPP amplitude is assumed to equal 0), and 2) whether and when accumulation ended if it did take place. Our interpretation is that, given that trials where pulse 1 was low coherence were 1) less likely to terminate early (Fig. 4B) and 2) had achieved lower levels of accumulated evidence (Fig. 4C), the LL and LH conditions are linked to a higher proportion of trials where accumulation at pulse 2 does occur, and it lasts for a longer amount of time because the distance required to reach a bound is longer than in their pulse 1 high-coherence counterparts. We have clarified this point in the results section describing the model.

      The reviewer notes: “Another explanation for the lower average amplitude is that all trials have a smaller amplitude (somewhat different from the main conclusions of the paper)”. However, our interpretation in fact predicts that the vast majority of trials should indeed exhibit smaller amplitudes. That can again be explained by the three trial types mentioned above. Unlike in CPP-P1, there would be a majority of trials where integration does not occur at all. Only trials where evidence was at least partially integrated during P2 would be predicted to have CPPP2 amplitudes that are overall positive, and even in those instances, average amplitudes would be overall lower than CPP-P1 in trials that terminated early, because of the lower distance remaining to be covered before hitting a bound. Author response image 2 illustrates this point. Thus, the prediction regarding how CPP amplitude variance or distribution shape would compare between P1 and P2 is less straightforward than if it were all-or-none on P2, not to mention the fact that EEG noise would likely drown-out distributional features like this. We therefore focus on a comparison of the means, for which our model has the clear prediction that most trials should exhibit lower CPP-P2 amplitudes. To assess whether empirical observations meet this prediction, and following the reviewer’s suggestion, we extracted the mean amplitudes around 0.45-0.55s after P1 and P2, for each single trial. CPP-P2 data were baselined using the amplitude 100 ms before P2 onset, as in Fig. S5 - note that this is likely to introduce spurious drifts due to overlapping potentials from P1, but given that grand averaged traces still qualitatively captured the key effects we assume it is a valid approach. We then pooled CPP-P1 and CPP-P2 amplitudes across pulses, and z-scored them for each participant separately. In both experiments, in a majority of participants (Exp. 1: 16/22, Exp. 2: 17/21) the median z-CPP-P1 amplitude was higher than that of z-CPP-P2. Author response image 3 illustrates the pooled distributions.

      Author response image 2.

      Decision variable simulations illustrating sample single trials (top) and CPP traces averaging data across conditions and N = 1000 trials (bottom), using model fits from Exp 2, in the long gap condition. Overlaid text indicates the percentage of trials in each subset, for each condition. The horizontal line indicates the bound; shaded areas indicate pulse presentation times. A. The bound was hit during P1, and therefore no further accumulation occurred during P2. B. The bound was hit during P2, and therefore P2 was only partially accumulated, C. No bound was hit, and therefore all evidence from P2 was accumulated.

      Author response image 3.

      Pooled CPP–P1 and CPP-P2 amplitudes [450-550ms post-pulse] distributions, normalised within-participant, and baselined 100ms before pulse onset. In both experiments, CPP-P2 amplitudes had a lower median (vertical line) normalised amplitude than CPP-P1.

      (4) A minor note: Full details of stimulus presentation (size, number of dots, dot size, speed, lifetime) would be appreciated.

      Thank you - we have now provided these details in the methods section (see also reply to public reviews above).

      (5) Are the authors sure they want to use this 'Gaps task' name? It seems a bit strange to introduce this name in this context, where there isn't really a 'Gap' (random dot motion fills the gap). A reader could get the impression the name was given in the Kiani et al., 2013 study (page 3, paragraph 1: "This scenario has begun to be studied using an intermittent- evidence or "gaps" task (Kiani et al., 2013) ...") but this is not true, Kiani et al. never use the term "Gaps task", nor has any other study since (as far as I know).

      We thank the reviewer for noting this oversight on our part - we have now made it clear that “gaps task” is the way we refer to the task originally developed by Kiani et al. 2013 in the introduction. We have decided to still use this name because it is a convenient proxy, in the understanding that “gap” refers to a “gap” in coherent motion as in Kiani et al (2013), albeit not a proper blank as in the original implementation.

    1. eLife Assessment

      This study provides valuable insights with convincing evidence detailing altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel and quantitative tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. The behavioral experiments are well executed and set the stage for subsequent mechanistic, causal, and computational approaches. The work is relevant to those interested in autism, cognition, and/or sensory processing.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative.

    4. Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides valuable insights with solid evidence into altered tactile perception in a mouse model of ASD (Fmr1 mice), paralleling sensory abnormalities in Fragile X and autism. Its main strength lies in the use of a novel tactile categorization task and the careful dissection of behavioral performance across training and difficulty levels, suggesting that deficits may stem from an interaction between sensory and cognitive processes. However, while the experiments are well executed, the reported effects are subtle and sometimes non-significant. The interpretation of results may be overextended given the nature of the data (solely behavioral), the reliance on repeated d′ measures may obfuscate some of the results without clearer psychometric or regressionbased analyses, and the absence of mechanistic, causal, or computational approaches limits the strength of the broader conclusions. The work will be relevant to those interested in autism, cognition, and/or sensory processing.

      We thank the editors for their positive assessment of the data quality and the novelty of our behavioral task, and for pointing out the limitations inherent in behavioral studies.

      We would like to clarify one important point regarding the use of d′ measures. While d′ was included to quantify sensitivity, our conclusions are not based solely on repeated d′ measures. In addition to d′, we analyzed raw behavioral data (correct and incorrect choice rates), and categorization performance was assessed using psychometric curves fitted with logistic regression models. These complementary analyses provide converging evidence and ensure that our interpretations are supported by multiple robust measures.

      In the revised manuscript, we have further strengthened the analyses by including additional regression-based assessments, reporting effect sizes for subtle effects, and refining the statistical methods for clarity and transparency.

      We fully acknowledge that this work is behavioral and does not directly reveal the underlying neural mechanisms. Nonetheless, the translational framework we have developed establishes a robust foundation for future studies. This platform can be directly applied in clinical research on autism and other neuropsychiatric conditions involving sensory-cognitive interactions, and provides a solid basis for subsequent mechanistic, causal, or computational investigations to uncover the neural circuits mediating these effects.

      We greatly appreciate the editors’ and reviewers’ guidance and believe the revisions have clarified and strengthened the manuscript.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention.

      We appreciate the reviewer’s statement highlighting the importance of our study.

      Strengths:

      The experiments seem well performed, with interesting results. Thus, this study can/will advance our understanding of atypical tactile perception and its relation to cognitive factors in autism.

      We thank the reviewer for recognizing the quality of our experiments and the relevance of our findings for understanding tactile perception and cognition in autism.

      Weaknesses:

      Certain aspects of the analyses (and therefore the results) are unclear, which makes the manuscript difficult to understand. Clearer presentation, with the addition of more standard psychometric analyses, and/or other useful models (like logistic regression) would improve this aspect. The use of d' needs better explanation, both in terms of how and why these analyses are appropriate (and perhaps it should be applied for more specific needs rather than as a ubiquitous measure).

      We thank the reviewer for these constructive comments. We acknowledge that aspects of the analyses were previously difficult to follow, and we have reworked the Results section to improve clarity and transparency.

      We would like to emphasize that all d′ measures are complemented by analyses of raw response rates (correct and incorrect choices), ensuring that our interpretations are not solely dependent on this metric. In addition, we applied standard psychometric analyses wherever possible. For the training phase, only two stimulus amplitudes were presented, which precluded the construction of full psychometric curves; however, for the categorization phase, psychometric analyses were feasible and are reported in Figure 3. Specifically, psychometric functions were fitted to the data using logistic regression, allowing us to estimate both categorization bias (threshold) and precision (slope) across stimulus intensities. These analyses revealed no evidence of categorization bias or precision in Fmr1<sup>-/y</sup> mice across stimulus strengths.

      Following the reviewer’s suggestion, we have also added general linear model analyses that account for trial history, providing a complementary perspective on decision-making dynamics. Finally, while the calculation of d′ is detailed in the Methods, we have revised the Results to clearly explain its use and appropriateness in each relevant analysis.

      These revisions aim to provide a clearer, more comprehensive picture of the data while ensuring that all conclusions are supported by multiple complementary measures.

      Reviewer #2 (Public review):

      Summary:

      This manuscript presents a tactile categorization task in head-fixed mice to test whether Fmr1 knockout mice display differences in vibrotactile discrimination using the forepaw. Tactile discrimination differences have been previously observed in humans with Fragile X Syndrome, autistic individuals, as well as mice with loss of Fmr1 across multiple studies. The authors show that during training, Fmr1 mutant mice display subtle deficits in perceptual learning of "low salience" stimuli, but not "high salience" stimuli, during the task. Following training, Fmr1 mutant mice displayed an enhanced tactile sensitivity under low-salience conditions but not high-salience stimulus conditions. The authors suggest that, under 'high cognitive load' conditions, Fmr1 mutant mouse performance during the lowest indentation stimuli presentations was affected, proposing an interplay of sensory and cognitive system disruptions that dynamically affect behavioral performance during the task.

      Strengths:

      The study employs a well-controlled vibrotactile discrimination task for head-fixed mice, which could serve as a platform for future mechanistic investigations. By examining performance across both training stages and stimulus "salience/difficulty" levels, the study provides a more nuanced view of how tactile processing deficits may emerge under different cognitive and sensory demands.

      We thank the reviewer for emphasizing the strengths of our task design and analysis approach, and we appreciate that the potential of this platform for future mechanistic investigations is recognized.

      Weaknesses:

      The study is primarily descriptive. The authors collect behavioral data and fit simple psychometric functions, but provide no neural recordings, causal manipulations, or computational modeling. Without mechanistic evidence, the conclusions remain speculative.

      We thank the reviewer for the careful reading of our manuscript and for these constructive comments. We agree that our study is purely behavioral, and we appreciate the opportunity to clarify the scope and interpretation of our findings. The primary goal of this work was to characterize behavioral patterns during tactile discrimination and categorization in a translationally relevant mouse model of autism.

      Although we did not include direct neural recordings, causal manipulations, or computational modeling, our analyses combining choice behavior, sensitivity measures from signal detection theory, psychometric curves, and regression-based models of trial history provide a detailed and robust characterization of perceptual learning, stimulus discrimination, categorization, and the interplay of cognitive processes with tactile perception. The manuscript has been revised to explicitly state that our conclusions are behavioral, emphasizing that this work establishes a foundation for future studies aimed at elucidating the neural and circuit mechanisms underlying these sensory–cognitive interactions.

      Second, the authors repeatedly make strong claims about "categorical priors," "attention deficits," and "choice biases," but these constructs are inferred indirectly from secondary behavioral measures. Many of the effects are based on non-significant trends, and alternative explanations (such as differences in motivation, fatigue, satiety, stereotyped licking, and/or reward valuation) are not considered.

      Alternative explanations for our findings including differences in motivation, fatigue, satiety, stereotyped licking, or reward valuation were carefully considered. As described in the Methods, only testing sessions with >70% correct performance on the training stimuli (12 µm and 26 µm) were included, excluding sessions with reduced motivation, fatigue, satiety, or stereotyped licking that could confound performance on low- or high-salience stimuli.

      Although differences in reward valuation could affect learning speed, we observed no genotype differences in training duration (Fig. 1B-D, Fig. S1C-D). Sessions with disengagement were analyzed only during epochs of active task performance (information added to the revised Methods section, lines 619-620). Reward-driven choice biases were unlikely, as no genotype differences were observed in categorization bias (Fig. 3F) and GLM analyses confirmed that previous reward outcome did not affect current choices (Fig. 4D).

      Finally, altered reward valuation could increase miss rates. Elevated miss rates in Fmr1<sup>-/y</sup> mice were restricted to the lowest-intensity stimulus (12 µm) under high cognitive load, demonstrating a salience- and context-specific effect inconsistent with generalized motivational or reward deficits. The Discussion has been updated to clarify these points and delimit the scope of our interpretations (lines 483-499).

      Third, the mapping of the behavioral results onto high-level cognitive constructs is tenuous and overstated. The authors' interpretations suggest that they directly tested cognitive theories such as Load Theory, Adaptive Resonance Theory, or Weak Central Coherence. However, the experiments do not manipulate or measure variables that would allow such theories to be tested. More specific comments are included below.

      This was not done intentionally. References to Load Theory were meant to provide conceptual inspiration for assessing attention in high cognitive load conditions during categorization, rather than to indicate a formal test. Moreover, we do not claim to have tested the Weak Central Coherence theory, although our results suggest reduced facilitation of across- category discrimination. Finally, we agree that citing Adaptive Resonance Theory, which is grounded in artificial neural network models, could be misleading, and we have revised the text accordingly.

      (1) The authors employ a two-choice behavioral task to assess forepaw tactile sensitivity in Fmr1 knockout mice. The data provide an interesting behavioral observation, but it is a descriptive study. Without mechanistic experiments, it is difficult to draw any conclusions, especially regarding top-down or bottom-up pathway dysfunctions. While the task design is elegant, the data remain correlational and do not advance our mechanistic understanding of Fmr1-related sensory and/or cognitive alterations.

      We thank the reviewer for this comment and agree that our study is purely behavioral and does not provide direct mechanistic evidence for top-down pathway dysfunction. In the first version of the manuscript, the term “top-down” was used at the behavioral level, referring to the influence of higher-order cognitive processes (e.g., categorization, attention, sensory and choice history integration) on tactile perception, rather than to imply specific neural circuits.

      We acknowledge that identifying the neural pathways underlying these effects would require extensive mechanistic experiments, including identifying the specific top-down pathway that modulates the influence of categorization on discrimination without directly altering categorization itself and performing pathway-specific recordings and manipulations. Such work represents a substantial mechanistic research program beyond the scope of the present study.

      To clarify that our study does not provide insights into the neural underpinnings of the studied behavioral processes, we have revised the manuscript, removing the term “top-down” or replacing it with “higher-order processes” where appropriate. We also explicitly noted that future work using neural recordings or causal manipulations will be needed to uncover the neural underpinnings of these behavioral phenomena (lines 508-510).

      (2) The conclusions hinge on speculative inferences about "reduced top-down categorization influence" or "choice consistency bias," but no neural, circuit-level, or causal manipulations (e.g., optogenetics, pharmacology, targeted lesions, modeling) are used to support these claims. Without mechanistic data, the translational impact is limited.

      We recognize that terms such as “reduced top-down categorization influence” and “choice consistency bias” are derived from behavioral observations. However, we respectfully note that these behavioral inferences are widely used in clinical studies to characterize cognitive tendencies (Soulières et al., 2007; Feigin et al., 2021) and are not inherently speculative.

      The translational impact of our work lies in the development of a robust behavioral platform that allows precise dissection of tactile perception and cognitive influences in a manner directly comparable to clinical studies. While we agree that neural, circuit-level, or causal manipulations would provide valuable mechanistic insight, the current study establishes a foundational behavioral framework that can guide and inform future investigations into the underlying neurobiological substrates.

      To ensure clarity, we have revised the manuscript throughout to explicitly indicate that all conclusions are based on behavioral measures and do not imply mechanistic evidence.

      (3) Statistical analysis:

      (a) Several central claims are based on "trends" rather than statistically significant effects (e.g., reduced task sensitivity, reduced across-category facilitation). Building major interpretive arguments on non-significant findings undermines confidence in the conclusions.

      We chose to present both statistically significant effects and trends to ensure transparency and to highlight that commonly used aggregate measures, such as d′, can sometimes obscure meaningful underlying patterns. In the text, p-values between 0.05 and 0.1 are described as trends without over-interpreting their significance. To further support interpretation, we have now computed effect sizes (Hedges’ g) for all subtle effects. In the revised manuscript, all interpretations of non-significant effects have been reworded to avoid overstatement.

      (b) The n number for both genotypes should be increased. In several experiments (e.g., Figure 1D, 2E), one animal appears to be an outlier. Considering the subtle differences between genotypes, such an outlier could affect the statistical results and subsequent interpretations.

      The number of mice used per genotype is consistent with standard practices in behavioral studies of sensory processing. To complement statistical analyses and account for small sample sizes, we have calculated effect sizes (Hedges’ g) for all subtle or trend-level effects (p ≈ 0.05–0.1), providing a measure of effect magnitude independent of sample size.

      As the reviewer correctly noted, no animals were excluded as outliers, since observed variability reflects true biological differences rather than experimental or technical errors. In the revised manuscript, we re-examined all datasets for potential outliers, and when identified, analyses were performed both with and without the data point. Any results sensitive to single animals are explicitly reported. This procedure is now detailed in the Methods section (lines 675-679).

      (c) The large number of comparisons across salience levels, categories, and trial histories raises concern for false positives. The manuscript does not clearly state how multiple comparisons were controlled.

      We thank the reviewer for highlighting this important point. To control for false positives arising from multiple comparisons, we applied the Bonferroni correction. This information has been added to the Methods section (line 682) to ensure transparency and reproducibility of all statistical tests.

      (d) The data in Figure 5, shown as separate panels per indentation value, are analyzed separately as t-tests or Mann-Whitney tests. However, individual comparisons are inappropriate for this type of data, as these are repeated stimulus applications across a given session. The data should be analyzed together and post-hoc comparisons reported. Given the very subtle difference in miss rates across control and mutant mice for 'low-salience' stimulus trials, this is unlikely to be a statistically meaningful difference when analyzed using a more appropriate test.

      We thank the reviewer for raising this point, as this was not done intentionally. In the revised manuscript, miss rates for high- and low-salience stimuli were reanalyzed using a mixedeffects linear model, which appropriately accounts for repeated measurements within sessions (Fig. 5; Results section: lines 320-340). This analysis confirmed that Fmr1<sup>-/y</sup> mice exhibit increased miss rates specifically at the 12 µm amplitude, with the effect disappearing at higher low-salience amplitudes (18 µm). Post-hoc comparisons with Bonferroni correction revealed a strong trend for increased misses at 12 µm (T-test: t = -2.8437, p = 0.058, Hedge’s g = 1.23), while no significant differences were found at other amplitudes. The Methods section has been updated to detail this statistical approach for analyzing miss rates (lines 686687).

      (4) Emphasis on theoretical models:

      The paper leans heavily on theories such as Adaptive Resonance Theory, Load Theory of Attention, and Weak Central Coherence, but the data do not actually test these frameworks in a rigorous way. The discussion should be reframed to highlight the potential relevance of these frameworks while acknowledging that the current data do not allow them to be assessed.

      As mentioned above, our goal was not to directly test theoretical frameworks such as Adaptive Resonance Theory, Load Theory of Attention, or Weak Central Coherence, but rather to provide a context for interpreting our behavioral findings. In the revised manuscript, we have removed references to the Load Theory from the Results section and reframed the Discussion to emphasize that our results are consistent with certain predictions from these cognitive theories, without implying that the experiments directly assessed them. This clarifies that the interpretations are based on observed behavioral patterns, while still acknowledging the potential relevance of these frameworks to better understand tactile perception and cognition in autism.

      Reviewer #3 (Public review):

      Summary:

      Developing consistent and reliable biomarkers is critically important for developing new pharmacological therapies in autism spectrum disorders (ASDs). Altered sensory perception is one of the hallmarks of autism and has been recently added to DSM-5 as one of the core symptoms of autism. Touch is one of the fundamental sensory modalities, yet it is currently understudied. Furthermore, there seems to be a discrepancy between different studies from different groups focusing on tactile discrimination. It is not clear if this discrepancy can be explained by different experimental setups, inconsistent terminology, or the heterogeneity of sensory processing alterations in ASDs. The authors aim to investigate the interplay between tactile discrimination and cognitive processes during perceptual decisions. They have developed a forepaw-based 2-alternative choice task for mice and investigated tactile perception and learning in Fmr1-/y mice.

      Strengths:

      There are several strengths of this task: translational relevance to human psychophysical protocols, including controlled vibrotactile stimulation. In addition to the experimental setup, there are also several interesting findings: Fmr1-/y mice demonstrated choice consistency bias, which may result in impaired perceptual learning, and enhanced tactile discrimination in low-salience conditions, as well as attentional deficits with increased cognitive load. The increase in the error rates for low salience stimuli is interesting. These observations, together with the behavioral design, may have a promising translational potential and, if confirmed in humans, may be potentially used as biomarkers in ASD.

      We appreciate the reviewer’s positive assessment regarding our study’s translational value and the importance of our behavioral findings.

      Weaknesses:

      Some weaknesses are related to the lack of the original raster plots and density plots of licks under different conditions, learning rate vs time, and evaluation of the learning rate at different stages of learning. Overall, these data would help to answer the question of whether there are differences in learning strategies or neural circuit compensation in Fmr1-/y mice. It is also not clear if reversal learning is impaired in Fmr1-/y mice.

      We thank the reviewer for these helpful suggestions. We agree that visualizing behavioral patterns, such as raster and density plots of licks, as well as learning rate over time, provides additional insights into learning dynamics. In response, we have added these analyses to the revised manuscript (Fig. S1, Fig. S2), which illustrate both individual and group-level learning trajectories and trial-by-trial licking patterns.

      There was no assessment of reversal learning in Fmr1<sup>-/y</sup> mice in this study. While this is an interesting and important question, and is motivated by previous preclinical and clinical findings, it falls outside the scope of the current manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Main Comments

      (1) This study addresses the important question of how top-down cognitive processes affect tactile perception in autism - specifically, in the Fmr1-/y genetic mouse model of autism vs. WT controls. Using a 2AFC tactile task in behaving mice, the study investigated multiple aspects of perceptual processing, including perceptual learning, stimulus categorization and discrimination, as well as the influence of prior experience and attention. The experiments seem well performed, with interesting results. I found certain aspects of the analysis not clearly explained, which made it difficult at times to understand.

      Please see specific details in the comments below.

      (2) To measure sensitivity, the authors present many comparisons of d' - sometimes between pairs of stimuli (or sometimes even for a single stimulus level).

      (a) Firstly, the calculation of d' for a single stimulus value is unclear (because the same proportion of high/low choices for a given stimulus can result from shifts in bias/criterion).

      We agree with the reviewer that calculating d′ for a single stimulus conflates sensitivity with response bias/criterion differences. For this reason, the panels showing d′ for individual stimulus amplitudes during training (Fig. 1F and 1G in the original manuscript) have been removed from the manuscript.

      In addition, we revised our d’ (Fig. 1E) and criterion calculations (Fig. 2A), treating the high amplitude stimuli as “signal” and low amplitude stimuli as “noise”, based on the Signal Detection Theory. The formulas used in the revised manuscript take into account correct responses during high amplitude stimuli and wrong responses during low amplitude stimuli to calculate the sensitivity and bias of the mice during discrimination in the training period.

      Sensitivity (d′) is now computed as:

      d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus)

      and the criterion (c) as:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      (b) Secondly, while calculating d' makes sense for comparing two stimulus levels (like in the training condition), in the test condition (with a spread of stimuli), this becomes a little tedious - at times difficult to follow and unclear.

      I would have thought that sensitivity (at least for overall performance) would be better compared using data from all the stimuli - e.g. either using:

      (i) the sigma of the psychometric curve (although the downside of that approach is that it ignores history effects), or

      (ii) a logistic regression for the choices, given the stimuli, where the weights assigned to the stimulus magnitude indicate sensitivity (the advantage of that approach is that history effects, like the previous trials/choices can be used as regressors in the model). Accordingly, it can simultaneously also quantify the history effects. This could even be expanded to a GLMM (mixed effects for different mice).

      We thank the reviewer for this very valuable feedback. Indeed, during the testing phase, we calculated sensitivity d’ to probe the overall categorization sensitivity (Fig. 3H).

      (i) This analysis was only complementary to the psychometric curves (fitted on the rightward lick rate for each stimulus amplitude using a general linear model – Fig. 3A). As the reviewer proposes, we had calculated the sigma of the psychometric curve (Fig. 3G, slope) to assess categorization precision. Sensitivity calculations have also now been revised using the aforementioned formula (d' = z(lick right|high amplitude stimulus) - z(lick right|low amplitude stimulus).

      (ii) To incorporate history effects, we implemented generalized linear models (GLMs) with a binomial link function to predict high-salience licks (right-lick choices) based on the current stimulus, trial history, genotype, and their interactions. A main-effects model included current stimulus, previous stimulus, previous outcome, previous choice, and genotype, followed by interaction terms to assess genotype-specific modulation of history effects. These analyses are now presented in the new Figure 6.

      The resulting coefficients are shown in Fig. 6A. As expected, decisions were primarily driven by current stimulus amplitude (Fig. 6A, B). Both genotypes displayed a tendency to repeat previous choices (Fig. 6A, C), while previous reward outcomes did not influence current choice (Fig. 6A, D). Notably, stimulus amplitude history showed genotype-specific effects: WT mice were negatively influenced by the previous stimulus, whereas Fmr1<sup>-/y</sup> mice remained unaffected (Fig. 6A, E).

      To clearly visualize these findings, we plotted psychometric curves and marginal effects accounting for current stimulus, previous choice, previous outcome, and previous stimulus (Fig. 6B-E). These analyses are now fully integrated into the Methods (lines 688-702), Results (Fig. 6, lines 341-369), and Discussion (lines 469-479) sections of the revised manuscript.

      (3) I find some of the terminology used confusing/misleading:

      (a)The term "Categorization thresholds" can be misleading - in psychometric curves, "thresholds" often refer to the sigma (SD) of the fitted curve used to measure sensitivity (inversely related). Here, I think that the meaning is in terms of the PSE/ criterion. Perhaps the terminology can be improved to prevent confusion on this matter. E.g., I think that here the authors mean a measure of bias/criterion/PSE or similar. Correct? Not really a perceptual "threshold".

      We thank the reviewer for pointing this out. In our analysis, the term “threshold” referred to the inflection point (i.e., the midpoint parameter μ) of the fitted logistic psychometric function used to categorize high- versus low-amplitude stimuli. We termed it “threshold” in the categorization of high and low amplitude stimuli. We agree with the reviewer that we could also use the term “Categorization bias”. We originally opted to avoid this term, not to confuse the readers when referring to the criterion (signal detection theory) as “response bias”. However, seeing as the term “threshold” may be confusing as well, we adopted the term “Categorization bias” in the updated version of the manuscript (lines 282, 284, 637-638, 785, Fig. 3F).

      (b) Similarly, I think that "Categorization accuracy" can be misleading when describing the slope of the psychometric curve. Performance could have a steep slope but still be quite inaccurate (e.g., if there is a big bias). Perhaps "precision" is a better description of the slope?

      We thank the reviewer for this suggestion. The slope of the psychometric curve is often referred to as “sensitivity” in the literature (Carandini and Churchland, 2014), but in our original manuscript we used the term “accuracy” to avoid confusion with the d′ measure from signal detection theory. We have revised the manuscript and Figures with the term “precision” as the reviewer suggested (lines 282, 284, 637-638, 786, Fig. 3G).

      Minor Comments

      (1) Abstract: "determines how autistic individuals engage" - there are other factors too. So, I think that "determines" is a little strong. Perhaps "influences" is more appropriate.

      We have incorporated the reviewer’s suggestion (line 7).

      (2) Figure 1 F, G. On the one hand, d' is defined as "sensitivity (d') in discriminating between high- and low-salience stimuli" - that seems to make sense. But then d' is also calculated and presented for each salience level on its own. How was this done? Namely, percent correct (or proportion of choices high/low salience) could be affected by criterion shifts as well as sensitivity. This makes calculating the d' for a single (low or high) salience stimulus ambiguous. So, how do these authors make this conclusion?

      We agree that calculating d′ for a single stimulus amplitude is ambiguous, because the resulting value conflates true stimulus sensitivity with shifts in response bias or criterion. Consequently, all analyses and figures reporting d′ for individual high- or low-salience stimuli (e.g., Figures 1F and 1G) have been removed from the revised manuscript.

      In the updated analyses, d′ is calculated only across high- versus low-salience stimuli, following standard Signal Detection Theory procedures, ensuring that it reflects true discriminability between the two categories (Methods, line 631; Figure 1E).

      (3) "Our results showed comparable correct choice rates in Fmr1-/y and WT mice (Fig. 1H), for both high- and low-salience stimuli (Fig. S1C-D). In contrast, Fmr1-/y mice presented a significantly higher rate of incorrect choices (Fig. 1I)." - aren't correct choices and incorrect choices complementary (i.e., 1-x) in a 2AFC? How is this possible?

      We thank the reviewer for pointing this out. Correct and incorrect choices are complementary at the single-trial level if miss trials are excluded. However, in our analyses, correct and incorrect choice rates were calculated by normalizing the number of correct or incorrect responses to the total number of trials (including misses), which breaks this complementarity and contributes to the differences observed in Fig. 1H–I. This was clarified in the Methods section (lines 616-617). Moreover, incorrect responses were less frequent than correct ones and are thought to reflect lapses, response bias, and impulsive responding rather than sensory performance, making them more sensitive to genotype-dependent differences in behavioral control. Based on this concept, we further examined whether incorrect choices were preferentially associated with specific stimulus amplitudes and assessed response bias and prior effects.

      (4) The conclusion that "they showed a strong trend toward reduced sensitivity for lowsalience stimuli (Fig. 1G)" has a confound - it could be that there was a criterion shift (rather than differences in sensitivity)?

      We agree with the reviewer that the previously reported trend in sensitivity for low-salience stimuli could reflect a criterion shift rather than true differences in sensory sensitivity. Because sensitivity estimates for individual stimulus amplitudes are not well-defined in a 2AFC framework, we have removed the sensitivity calculations for high- and low-salience stimuli considered independently. Instead, we now present salience-specific differences using correct and incorrect response rates for each stimulus amplitude, which more directly capture performance differences without assuming changes in sensory sensitivity (Fig. 1G-I, S1E-F).

      (5) Figure 3D, E - I stumbled over this in comparison to Figure 3B, C. That is because (a) In D and E, the authors compare right-lick responses (reporting high salience) to stimuli of 12 μm and 14 μm amplitude (Figure 3D) and low-salience lick rates for the same (Figure 3E). I would have thought that these approaches are simply complementary (1-x) - see related minor question above/below. So, what is the advantage of presenting them both?

      We presented both panels to clarify the source of the observed differences in performance. Specifically, showing right-lick responses (reporting high-salience choices) alongside low salience lick rates allows us to distinguish whether reduced high-salience reporting arises from an actual shift in choice (e.g., increased leftward licking) versus an increase in miss trials at the lowest amplitude (12 µm). By presenting both, we can demonstrate that the effect is primarily driven by an increase in leftward choices rather than by missed responses, providing a more precise interpretation of behavioral changes. The complementary analysis for leftward choices has now been moved to the supplemental material (Fig. S5A) and the reason for this analysis has been clarified in the Results (lines 275-276).

      (b) In B and C, the authors compare two differences in stimulus magnitude (2 and 4 μm), but in Figure 3D and E, only one difference (2 μm) from two perspectives. I was expecting a comparison with stimuli differing by 4 μm in amplitude (comparable to the high stimulus comparison of 26 μm vs. 22 μm stimuli).

      We have indeed analyzed the 12 μm versus 16 μm stimulus pair, which corresponds to a 4 μm difference and is reliably discriminated by both genotypes. In the original manuscript, we did not include this comparison because of the differences already seen at a 2 μm amplitude difference. Based on the reviewer’s suggestion, we have now included the 12 μm vs. 16 μm comparison in the revised manuscript (Results, lines 270-272; Fig. 3E) to provide a complementary perspective consistent with the high-salience comparisons (26 μm vs. 22 μm).

      (c) "Sensitivity d' for high- and low-salience stimuli was calculated based on the Correct and Incorrect choice rate for high- and low-salience stimuli respectively." How were trials for which the animal did not respond taken into account? Were these part of the denominator? Or were these excluded when calculating proportions? (related to the Q regarding Figure 3 D,E above).

      Indeed, the Miss trials were part of the denominator. This is now clarified in the Methods section (line 631).

      (d) "c = d'(high)- d'(low)." - I did not understand this fully. There were several high and several slow stimuli - so how were these calculated? Pooled for high and pooled for low? Per stimulus difference?

      This was indeed calculated for pooled high and low amplitudes during testing. In the revised manuscript, criterion c has been recalculated based on the average correct high rate (for stimuli of 20-26 µm amplitude) and average incorrect low rate (for stimuli of 12-18 µm amplitude), using the same formula as in the analysis of the training dataset:

      c = −1/2 × [z(lick right / high amplitude) + z(lick right / low amplitude)]

      Pooling across amplitudes allows us to obtain a single summary measure of response bias toward the right lickport, independent of stimulus discriminability. This approach is consistent with standard signal detection theory practices when multiple stimulus levels are present.

      If the inter-trial interval is 5-10s, how is a 5s timeout a punishment?

      The 5 s timeout serves as a punishment by temporarily delaying access to the next trial and potential reward, thereby reducing the overall reward rate. Even though the inter-trial interval (ITI) varies between 5 and 10 s, the timeout increases the effective delay before the next opportunity to earn a reward, discouraging incorrect responses. This is consistent with standard operant conditioning procedures, where brief timeouts act as negative consequences without being overly severe. Across most trials, the timeout effectively reduces expected reward rate, though its impact is minimal when the ITI is already long.

      Reviewer #2 (Recommendations for the authors):

      Task-related questions:

      (1) What evidence is there that the 40 Hz, 12 μm stimulus is "low salience: while the 40 Hz, 26 μm stimulus is "high salience"? This seems like an arbitrary distinction without showing sensitivity curves across a group of animals. Better definitions of the stimuli and the actual forces applied are necessary.

      We thank the reviewer for this comment. Based on our previous work (Semelidou et al., bioRxiv; Accepted in Advanced Science), both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli are clearly suprathreshold. In the present study, however, stimulus salience is defined in a relative and operational manner within this suprathreshold range.

      Specifically, analysis of miss trials (Fig. S3E) shows that the 40 Hz, 12 μm stimulus consistently elicited a higher proportion of missed responses compared to the 40 Hz, 26 μm stimulus across animals, indicating lower behavioral performance for the lower-amplitude stimulus. We therefore refer to the 12 μm stimulus as “low salience” and the 26 μm stimulus as “high salience” to denote relative differences in perceptual strength and attentional engagement within the suprathreshold range, rather than differences in detectability or absolute sensory sensitivity. This definition has been clarified in the Methods (lines 583-587) and Results sections (lines 115-119; lines 225-227).

      (2) Sensitivity curves/detection thresholds for each mouse should be included in the study.

      We thank the reviewer for this suggestion. Sensitivity curves and detection thresholds for low-amplitude and low-frequency vibrotactile forepaw stimulation have been systematically characterized in our previous study (Semelidou et al., bioRxiv, Accepted in Advanced Science). In that work, we demonstrated that stimuli with similar amplitudes and even lower frequency (10Hz) than those used in the present study are reliably detectable by mice, confirming that both the 40 Hz, 12 µm and 40 Hz, 26 µm stimuli fall within the suprathreshold range.

      Because the goal of the present study was not to determine absolute detection thresholds but rather to examine discrimination and categorization performance within a suprathreshold range, we did not re-establish full psychometric detection curves for each mouse.

      We have clarified this rationale in the revised manuscript (Results, lines 108-113; Methods, lines: 577-579).

      (3) What force is being applied during stimulus presentations? 12 or 26 μm does not provide enough information about the stimuli applied. What are the physical parameters of the indenter? What material, what tip size?

      Vibrotactile stimuli were delivered to the forepaw via a piezoelectric actuator. A 12.7 mm stainless steel post (ThorLabs) was mounted on the actuator vertically and a 0.6 mm stainless steel rod (ThorLabs) was clamped horizontally onto this post. The horizontal rod served as the contact bar on which the animal rested its right forepaw.

      Stimuli were sinusoidal vibrations at 40 Hz with peak-to-peak displacements of 12 μm (low salience) or 26 μm (high salience). The actuator displacement was calibrated prior to experiments to ensure accurate vibration amplitudes.

      Animals were positioned in the setup to ensure stable and consistent forepaw contact with the rod delivering the vibration. Pilot experiments with an extra sensor to monitor forepaw placement confirmed that the mice did not remove their forepaws from the bar before stimulus delivery. All this information is now added in the Methods section (lines 552-555, 580-582).

      (4) Only one vibration stimulus was used (40 Hz) - this preferentially activates specific subsets of low-threshold mechanoreceptors and not others. A range of vibrotactile stimuli (with varying frequencies) would be more useful. From this limited range of stimuli, it is difficult to assess whether the findings would extrapolate to other types of stimuli.

      We agree that using a single vibration frequency limits the generalization of our findings across the full range of mechanoreceptor subtypes and vibrotactile stimulus conditions. In the present study, we deliberately focused on amplitude discrimination within the flutter range (<50 Hz), as this frequency preferentially activates subsets of low-threshold mechanoreceptors relevant for flutter perception and is commonly used in clinical studies of tactile amplitude discrimination (Puts et al., 2014, 2017; Asaridou et al., 2022). By holding frequency constant and varying only amplitude, we were able to isolate amplitude-dependent perceptual and decision-making processes while minimizing frequency-dependent variability and to facilitate direct translational comparisons with human studies using similar flutter stimuli.

      We acknowledge, however, that extending the paradigm to additional, high frequencies would help determine whether the observed effects generalize across mechanoreceptor channels. We have now added this point as a future direction in the Discussion section (lines 510-514).

      (5) The methods indicate that during the implementation of the water-restriction protocol, mice had access to a solid water supplement in their home cage. How did they control for how much water supplement was consumed by each mouse before the testing sessions?

      We thank the reviewer for raising this point. The solid water supplement was divided into premeasured individual portions, and each mouse received its allotted amount only after the daily training/testing session. Daily body weight measurements were used to monitor hydration and ensure that all animals maintained stable body weight. If necessary, supplemental water was adjusted to maintain animals within the approved weight range. This procedure is now described in the Methods section (line 567-571).

      (6) A control version of the test, perhaps using a different sensory modality, would be useful for making conclusions.

      We agree that testing other sensory modalities would provide a useful control for assessing the generalizability of the observed effects. However, in the present study, we intentionally focused on the tactile modality, as touch has been shown to play a critical role in autism across sexes and predict other core behavioral symptoms. This makes touch particularly relevant for investigating translational mechanisms in this model.

      By specifically targeting tactile perception, we aimed to investigate the link between sensory discrimination, decision-making, and cognitive modulation within a modality that is strongly implicated in autism. Previous studies in autistic individuals have demonstrated similar interactions between cognitive processes and perceptual decision-making in the visual domain, suggesting that such effects may not be modality-specific. Nevertheless, extending this paradigm to additional sensory systems would be valuable to directly test whether comparable cognitive influences on perception generalize across modalities. We have now incorporated this perspective as a future direction in the Discussion section (lines 514-518).

      Reviewer #3 (Recommendations for the authors):

      There are several questions:

      (1) It is important to show stimulus intensity-response curves representing tactile responses for both WT and Fmr1-/y mice.

      We thank the reviewer for this important comment. Detection sensitivity curves for lowamplitude and low-frequency vibrotactile stimulation of the forepaw have been characterized in detail in our previous study (Semelidou et al., bioRxiv; now accepted in Advanced Science). In that work, we showed that stimuli at or above 8 µm amplitude and 10Hz frequency are reliably detected by both WT and Fmr1<sup>-/y</sup> mice.

      Based on these findings, the current study employed vibrotactile stimuli at a higher frequency (40 Hz) and amplitudes of 12 µm and above, ensuring that all stimuli were well within the suprathreshold range for both genotypes. This experimental choice was made to specifically probe discrimination, categorization, and decision-making processes, rather than basic sensory detection. As a result, the behavioral effects reported here cannot be attributed to differences in stimulus detectability.

      We have clarified this rationale in the revised manuscript to make explicit that the absence of full intensity-response curves in the current study reflects a deliberate focus on suprathreshold perceptual and cognitive processes rather than sensory threshold differences (Results, lines 108-113; Methods, lines: 577-579).

      (2) There is no difference in the time it takes to learn the task between WT and Fmr1-/y mice. But how does the learning rate curve look? Is there a difference in the slope between WT and Fmr1-/y early vs late into learning?

      We thank the reviewer for this suggestion. To directly address whether learning dynamics differed between genotypes, we analyzed learning curves across training.

      We first computed the correct choice rate per day for each animal (Fig. S2A) and fit a mixedeffects model including training day, genotype, and their interaction. This analysis revealed no genotype differences in baseline performance or learning rate with minimal Genotype × Day interaction (Fig. S2A-top, Fig. S2C).

      We additionally computed the slope of the learning curve for each individual, which also showed no difference across genotypes (Fig. S2B). In addition, within-animal day-to-day performance variability was also comparable across groups (Fig. S2A-bottom, S2D).

      These analyses indicate that WT and Fmr1<sup>-/y</sup> mice exhibit similar learning trajectories during training. The learning curves are now included in Figure S2, described in the Results (lines 140–151) and detailed in the Methods (lines 644-658).

      (3) It would be useful to see raster plots of licks for different trials and the corresponding lick density plots for early vs late trials.

      We thank the reviewer for this suggestion. To visualize trial-by-trial behavior, we included example lick traces from an early 100-trial session and a late 100-trial session, alongside the corresponding raster plots of licks (Fig. S1A–B).

      (4) Consistent with the first question, examples of intermediate learning stages would help gain more insight into how both WT and Fmr1-/y mice learn.

      In line with the reviewer’s suggestion, we examined whether WT and Fmr1<sup>-/y</sup> mice showed different performance during intermediate stages of learning. To this end, we defined the middle three days of the training period of each animal as the intermediate learning phase. We compared both the mean correct-choice rate and individual learning slopes across this interval. Statistical analyses revealed no significant genotype differences in either measure, indicating comparable performance and learning dynamics during the intermediate phase of training (lines 152-156).

      (5) How does the learning rate change with increased cognitive load for both WT and Fmr1-/y mice?

      We thank the reviewer for this question. While our experimental design did not include a manipulation of cognitive load during the learning phase itself, we assessed whether increased cognitive load affected performance by analyzing behavior on the first day of testing, when animals were required to categorize and discriminate among a larger set of stimuli compared to training.

      Using performance on the training stimuli during this first testing session as a proxy, we found no significant difference between WT and Fmr1<sup>-/y</sup> mice in correct choice rate (Author response image 1). This indicates that increased cognitive load did not differentially affect performance on familiar stimuli across genotypes at this stage.

      Because this analysis does not reflect learning rate per se, but rather performance under increased task demands after learning had already occurred, we did not incorporate it into the main Results section. Instead, it is presented here to directly address the reviewer’s question.

      Author response image 1.

      Correct choice rate for the 12 µm and 26 µm stimuli during the first day of testing when the cognitive load is high.

      (6) How does the learning rate change if the sensory stimuli are more challenging for both WT and Fmr1-/y to detect?

      We thank the reviewer for this question. In the present study, animals were deliberately trained using well-separated, suprathreshold low- and high-salience stimuli to ensure reliable stimulus detection and to avoid confounding learning rate with perceptual difficulty or discrimination limits.

      A recent study (Heimburg et al., 2025) has shown that learning is slower when the difference between the two training stimuli is reduced. Based on these results, we would expect that decreasing the separation between low- and high-salience stimuli would similarly increase training duration for both WT and Fmr1<sup>-/y</sup> mice, since our results do not indicate any discrimination or categorization deficits in the mouse model of autism. However, directly testing how stimulus difficulty modulates learning rate would require a dedicated manipulation of stimulus spacing during training and was beyond the scope of the current study.

      Editor's note:

      Should you choose to revise your manuscript, if you have not already done so, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and, where appropriate, 95% confidence intervals.

      These should be reported for all key questions and not only when the p-value is less than 0.05 in the main manuscript.

    1. eLife Assessment

      Du et al. present a valuable study examining neural activation in medial prefrontal cortex (mPFC) subpopulations projecting to the basolateral amygdala (BLA) and nucleus accumbens (NAc) during behavioral tasks assessing anxiety, social preference, and social dominance. The strength of the evidence linking in vivo neural physiology to behavioral outcomes was considered solid; however, the slice electrophysiology data and their interpretation were less well received. Overall, the reviewers felt that the revised work provides insight into how distinct mPFC→BLA and mPFC→NAc pathways influence anxiety, exploration, and social behaviors.

    2. Reviewer #1 (Public review):

      Summary:

      It is well known that neurons in the medial prefrontal cortex (mPFC) are involved in higher cognitive functions such as executive planning, motivational processing and internal state mediated decision-making. These internal states often correlate with the emotional states of the brain. While several studies point to the role of mPFC in regulating behavior based on such emotional states, the diversity of information processing in its sub-populations remains a less explored territory. In this study, the authors try to address this gap by identifying and characterizing some of these sub-populations in mice using a combination of projection-specific imaging, function-based tagging of neurons, multiple behavioral assays and ex-vivo patch clamp recordings.

      Strengths:

      The authors targeted mPFC projections to the nucleus accumbens (NAc) and basolateral amygdala (BLA). Using the open field task (OFT), the authors identified four relevant behavioral states as well as neurons active while the animal was in the center region ("center-ON neurons"). By characterizing single unit activity and using dimensionality reduction, the authors show differentiated coding of behavioral events at both the projection and functional levels. They further substantiate this effect by showing higher sensitivity of mPFC-BLA center-ON neurons during time spent in the open arms of the elevated plus maze (EPM). The authors then pivoted to the three-chamber social interaction (SI) assay to show the different subsets of neurons encode preference of social stimulus over non-social. This reveals an interesting diversity in the function of these sub-populations on multiple levels. Lastly, the authors used the tube test as a manipulation of the anxiety state of mice and compared behavioral differences before/after in the OFT and social interaction tasks. This experiment revealed that "losers" of the tube test spend less time in the center of the open field while "winners" show a stronger preference for the familiar mouse over the object. Using patch-clamp experiments, the authors also found that "winners" exhibit stronger synaptic transmission in the mPFC-NAc projection while "losers" exhibit stronger synaptic transmission in the mPFC-BLA projection. Given the popularity of the tube test assay in rank determination, this provides useful insights into possible effects on anxiety levels and synaptic plasticity. Overall, the many experiments performed by the authors reveal interesting differences in mPFC neurons relative to their involvement in high or low anxiety behaviors, social preference and social rank.

      Weaknesses:

      The authors have addressed all comments.

    3. Reviewer #2 (Public review):

      Summary:

      The goal of this proposal was to understand how two separate projection neurons from the medial prefrontal cortex, those innervating the basolateral amygdala (BLA ) and nucleus accumbens (NAc), contribute to the encoding of emotional behaviors. The authors record the activity of these different neuron classes across three different behavioral environments. They propose that, although both populations are involved in emotional behavior, the two populations have diverging activity patterns in certain contexts. A subset of projections to the NAc appear particularly important for social behavior. They then attempt to link these changes to the emotional state of the animal and changes in synaptic connectivity.

      Strengths:

      The behavioral data builds on previous studies of these projection neurons supporting distinct roles in behavior and extend upon previous work by looking at the heterogeneity within different projection neurons across contexts, this is important to understand the "neural code" within the PFC that contributes to such behaviours and how it is relayed to other brain structures.

      Weaknesses:

      The diversity of neurons mediating these projections and their targeting within the BLA and NAc is not explored. These are not homogeneous structures and so one possibility is that some of the diversity within their findings may relate to targeting of different sub-structures within BLA or NAc or the diversity of projection neuron subtypes that mediate these pathways. This is an important future direction for this work but does not detract from the main finding as reported. The electrophysiological data in Figure 7 have some experimental confounds that makes their interpretation challenging.

      Comments on revisions:

      The authors have improved the manuscript somewhat by refining their description of the results. However, the normalized EPSC experiments still do not make much sense. If you have a higher light intensity or LED duration the curve of the EPSC response will saturate earlier. Similarly, if you are in a highly, or poorly labeled slice or subregion of a slice then you will see responses emerge at different intensities based on the number of synapses labelled. There is no standardization in the way these experiments were performed, so performing some arbitrary post hoc normalisation does not correct for this. Similarly, they also place the fibreoptic manually above the slice each time. This makes it much harder to determine the actual light intensity delivered to the slice on a cell by cell and group by group basis.

      I have reduced my public statement from significant experimental confounds, to some experimental confounds. But the way the experiments were performed does not allow the normalized data to really be interpretable. They still argue that normalized EPSCs are relatively larger. I don't even really understand what this means biologically.

      The subsequent rise/decay and other measures is now better described. However, they note that the decay constant is larger. This means that the kinetics are slower, not enhanced, as they describe.

    4. Author response:

      The following is the authors’ response to the previous reviews

      We sincerely thank the editors and reviewers for their careful evaluation and constructive feedback, which has helped us substantially improve the clarity and rigor of the manuscript. In the revised version, we have clarified the interpretation of the electrophysiological experiments, corrected the labeling of recorded signals as light evoked EPSCs, and removed statements implying differences in absolute synaptic strength. To address concerns about the interpretation of Fig. 7, we have added quantitative analyses of EPSC kinetics and revised the text to focus on synaptic response dynamics rather than amplitude differences. We have also removed analyses that could cause confusion and expanded the Methods section to provide additional experimental details, including the optogenetic stimulation configuration in slice recordings. Together, these revisions strengthen the interpretation of the electrophysiological results and improve the overall clarity and transparency of the study.

      Public Reviews:

      Reviewer #1 (Public review):

      Weakness:

      The authors focused primarily on female mice limiting generalizability and leaving the readers with questions about the impact of sex differences on their results. The tube test is used as a manipulation of the "emotional state" in several of the experiments. While the authors show the changes to corticosterone levels as a consequence of win/loss in the tube test, stronger claims might be made with comparisons to other gold standard stressors such as forced social defeat or social isolation.

      We thank the reviewer for these thoughtful comments.

      First, we acknowledge that the present study was conducted primarily in female mice, which may limit the generalizability of the findings. Female mice were selected to reduce variability associated with male aggression and housing-related stress, which can complicate behavioral assays such as social interaction and dominance testing. While focusing on a single sex allowed us to maintain experimental consistency across multiple behavioral paradigms, we agree that sex differences could influence the neural circuits underlying emotional and social behaviors. We have now added a statement in the Discussion acknowledging this limitation and noting that future studies will be necessary to determine whether similar circuit mechanisms operate in male mice.

      Second, we appreciate the reviewer’s suggestion regarding the use of other stress paradigms. In this study, the tube test was used primarily to establish social dominance relationships between paired mice rather than as a classical stress-induction paradigm. Nevertheless, we observed measurable physiological changes associated with repeated win/loss outcomes, including alterations in corticosterone levels in brain lysates of loser mice after repeated tube-test competitions. Notably, repeated win/loss outcomes in the tube test were associated with significant increases in corticosterone levels in loser mice, indicating that the paradigm produced measurable physiological responses consistent with stress-related processes. These findings suggest that repeated social competition in this context can induce transient physiological and behavioral changes associated with social hierarchy. We agree that paradigms such as chronic social defeat stress or social isolation represent well-established models for inducing sustained stress responses. We have therefore revised the manuscript to clarify that the tube test in our study serves as a model of social competition and rank establishment rather than a canonical stress paradigm, and we highlight the comparison with other stress models as an important direction for future work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      In relation to figure 7. Their response does not really clarify the issue:

      (a) They argue that they are not making claims about synapse strength. However they still state "In the mPFC→NAc pathway, blue light stimulation evoked larger excitatory postsynaptic currents (EPSCs) in winner mice compared to losers (Fig. 7E). This suggests stronger synaptic transmission in winners' mPFC→NAc circuits. " They don't show this, they just show that normalized to some arbitrary value the responses of the earlier durations is higher or lower, which is very hard to interpret.

      They argue in the rebuttal that the aim of this is to highlight response kinetics, but these are not quantified or discussed in any way.

      We thank the reviewer for this helpful comment. We agree that the normalized input output curves shown in the original submission did not allow conclusions about absolute synaptic strength, and we also acknowledge that response kinetics were not previously quantified despite being mentioned in the rebuttal.

      To address both concerns, we have revised Fig. 7 and added quantitative analyses of EPSC kinetics. Specifically, we measured the rise and decay slopes of light-evoked EPSCs recorded in postsynaptic neurons within the NAc and BLA of winner and loser mice. In the mPFC→BLA pathway, both the EPSC rise and decay slopes were significantly increased in loser mice compared with winners (rise slope: p = 0.0138; decay slope: p = 0.0392), suggesting enhanced synaptic responsiveness and faster charge transfer kinetics in BLA neurons of losers. In contrast, in the mPFC→NAc pathway, both mEPSC rise and decay slopes were not significantly different between groups. 

      These results provide a quantitative characterization of synaptic response dynamics and reveal pathway-specific differences in synaptic properties associated with social hierarchy. Importantly, this analysis does not rely on amplitude normalization and therefore allows a more interpretable comparison of synaptic response profiles between groups. We have updated Fig. 7 and the corresponding Results section to include these analyses. 

      (b) They still haven't labeled the responses correctly. The responses in figure 7 are not "voltage spikes" but light-evoked EPSCs.

      We apologize for the incorrect terminology. All instances of “voltage spikes” have been corrected to “light-evoked EPSCs” in the figure legends and text.

      (c) They argue that responses do not vary across experiments/slices because they use a constant viral injection volume targeted to the same co-ordinates and identical placement of the fiber and recording location. While I am sure they aim to do that, it is almost impossible to ensure that this was identical across experiments and that the degree of opsin labelling in their slices was the same (See for example Mao et al., 2011 PMID: 21982373 who pioneer the approach of using within slice comparisons to account for this). If I understand their explanation of their strategy correctly, the authors own rebuttal highlights this point, they seem to have needed to vary the LED duration by an order of magnitude (1-10ms) to ensure reliable responses across experiments, even for the same projection.

      We thank the reviewer for raising this important point. We agree that it is not possible to ensure identical opsin expression or light delivery across experiments. We have revised the manuscript to explicitly acknowledge this limitation and clarify that normalization was used to mitigate, but not eliminate, inter-slice variability. We now avoid any interpretation that relies on absolute response amplitude across animals.

      Regarding “LED duration variability (1-10 ms)”, we agree that the need to adjust stimulation duration reflects variability in effective opsin activation across slices. We now clarify this point in the Methods and Results and emphasize that stimulation parameters were optimized to reliably evoke responses rather than to equate absolute light input across experiments.

      Importantly, our main conclusions do not rely on absolute EPSC amplitude comparisons. Instead, they are supported by analyses that are less sensitive to variability in opsin expression or light delivery, including EPSC kinetics (rise and decay slopes), paired-pulse ratio measurements, and AMPA/NMDA ratios. These complementary measures provide a more robust characterization of synaptic properties across conditions.

      (d) Similarly in Fig S6 it is unclear what they are showing. The Y axis is still labeled in pA, yet they claim this is an action potential? Also this analysis is rather irrelevant to the data shown in figure 7 as the pathway between PFC and BLA/NAc is not preserved.

      We thank the reviewer for pointing out the lack of clarity in Fig. S6. We agree that it does not directly inform the interpretation of Fig. 7 and may cause confusion. To improve the clarity and focus of the manuscript, we have therefore removed Fig. S6 from the revised manuscript. The removal of this supplementary figure does not affect the main conclusions of the study.

      (e) It now also seems that these experiments were performed by placing a fiber optic into the slice to elicit responses. This should be detailed in the methods.

      We thank the reviewer for noting this omission. We have added a detailed description of fiber-optic placement within the slice for optogenetic stimulation to the Methods section. Specifically, we clarify that blue light was delivered through a fiber optic positioned above the recorded slice to activate ChR2-expressing mPFC axon terminals within the BLA or NAc. The placement of the fiber relative to the recorded neurons and the stimulation parameters are now explicitly described in the revised Methods section.

    1. eLife Assessment

      This manuscript examines the evolution of molluscan shells using single-cell analyses of the adult mantle of Crassostrea gigas and compares these data with previous datasets from embryonic and larval stages of this species and other spiralians. The authors provide important support for a scenario in which secretory cells are broadly conserved across spiralians, and the incorporation of lineage-restricted genes contributes to the evolution of molluscan shells. While some of the conclusions of the authors are convincing, many aspects of the manuscript remain incomplete and could be improved, especially aspects of cell-type classification and validation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript examines the evolution of molluscan shells using single-cell analyses of the adult mantle of Crassostrea gigas and compares these data with previous datasets from embryonic and larval stages of this species and other spiralians. The authors provide support for a scenario in which secretory cells are broadly conserved across spiralians, and the incorporation of lineage-restricted genes contributes to the evolution of molluscan shells.

      Strengths:

      High-quality datasets for mantle tissue in Crassostrea gigas and thorough comparisons with existing datasets for this species and other spiralians. Balanced discussion.

      Weaknesses:

      No major weaknesses. The analyses follow fairly standard approaches in the field that have been previously applied and developed in similar systems.

    3. Reviewer #2 (Public review):

      Summary:

      Bai et al. present in their study three single-cell RNA seq datasets derived from gastrulae, trochophores, and adults of the bivalve Crassostrea gigas. While a dataset on the oyster trochophore has already been published previously (Piovani et al. 2023), the gastrula and adult datasets have not been published yet. The authors conclude that cell types secreting the oyster shell valves use a genetic repertoire that is also used by epithelial and secretory cell types of very different spiralians, such as annelids, chaetognaths and flatworms.

      Strengths:

      The study provides new single-cell datasets from multiple developmental stages of an oyster, offering a valuable resource for the field. It takes a broad comparative approach using state-of-the-art techniques across diverse animal groups and addresses an important question regarding the origin and evolution of shell-forming cell types.

      Weaknesses & suggestions to improve the manuscript:

      (1) Validation of cell types

      Cell type identities are not convincingly validated. Although the authors cite previous studies (l. 92), the referenced marker genes are largely not used, and the cited works do not provide sufficient spatial validation. Without in situ data, the inferred locations of cell types (e.g. Figure 2A) are not supported. Spatial validation of marker genes (e.g. via HCR) is essential, particularly for a study addressing shell field evolution. In addition, the gastrula dataset is not meaningfully analyzed, and its inclusion remains unclear.

      (2) Robustness of cell type classification

      Several proposed cell types may not represent distinct entities (not individuated) but rather reflect over-clustering. Marker genes are often not specific and are shared across clusters (e.g. Sec1/Sec2), making it difficult to distinguish cell types reliably.

      (3) Comparative analysis of secretory cells

      The comparative framework is not sufficiently supported. Secretory cells are highly diverse, and without proper validation, their comparison across taxa is not meaningful. The transcription factor analysis is limited, as only a few genes are shared and many are inconsistently expressed (Figure 3E). The conclusion of a conserved regulatory program across spiralians is therefore overstated.

      (4) Clarity and interpretation of results

      Results are at times difficult to follow and remain superficial. Marker genes are insufficiently annotated (especially for Crassostrea), and comparisons across taxa lack functional interpretation. Unvalidated and heterogeneous cell types are grouped together, and transcriptional similarities are overinterpreted. Overall, key conclusions are not adequately supported by the presented data.

    4. Reviewer #3 (Public review):

      Summary:

      This manuscript by Bai et al. reports single-cell transcriptomics of the oyster mantle to elucidate the respective contributions of ancient conserved programmes and lineage-specific genes to the origin of the molluscan shell. The authors compare their dataset with other oyster larval datasets as well as data from other organisms (annelids, chaetognaths) and find evidence of evolutionary conservation and functional similarity with secretory cell types. They also observe that cells involved in secreting the larval skeleton express predominantly recent genes, whereas the adult skeleton-secreting programme is evolutionarily more conserved.

      Strengths:

      The manuscript is well written and clearly presented, and the results are interesting, particularly the distinction between larval and adult skeleton secretion, which is placed in a thoughtful evolutionary context.

      Weaknesses:

      (1) My main concern is that the authors rely primarily on previous studies for the experimental and functional characterisation of the identified cell types. The cited papers (Piovani, 2023 and de la Forest Divonne et al., 2025) deal with distinct stages or tissues (larvae and hemocytes, respectively), which limits their direct relevance. The authors also cite other papers for in situ expression data; it would be helpful to summarise somewhere (e.g. in a table) which genes have been experimentally characterised and what their expression domains are, or alternatively to provide HCR or in situ staining on the mantle. For instance, what is the rationale for the claim that proliferative cells give rise to the mantle? The trajectory inference approach used (Monocle) would likely yield a similar result regardless of the reference cell type, so additional justification is needed.

      (2) More broadly, I find that the functional properties of the identified cell types and their relationship to the expressed genes deserve more detailed discussion. For example, at L100, several genes are mentioned, but their functional roles are not discussed. Similarly, the basis for annotating the proliferative cells is not explained. How was gene orthology assessed? Throughout the manuscript, vertebrate-style gene names are used without explicitly establishing orthology status in oyster, which should be addressed.

      (3) More detail is needed on the methods and quality control for the single-cell data. The authors should clarify that the platform used (BMKMANU) is a droplet-based technology comparable in principle to Drop-seq. BMKMANU is not widely used in the field. How does it compare to 10x Genomics in terms of sensitivity and cell recovery? The authors appear to use the 10x Chromium cellranger pipeline for data analysis, which suggests compatibility, but this should be stated explicitly. Additionally, no information is provided on the number of sequencing runs or biological replicates, nor on how reproducible the results are across samples.

      (4) A limitation of the phylostratigraphic analysis is that it is restricted to mantle tissue, making it difficult to place the results in a whole-organism context. How do the age profiles of mantle-expressed genes compare to those of more evolutionarily conserved tissues, such as the nervous system? I appreciate the methodological and experimental constraints, but this is a genuine limitation of the study. The authors could at least discuss it explicitly, and ideally consider generating a broader single-cell atlas of the oyster to provide this comparative baseline.

      (5) Have the authors considered the potential importance of lineage-specific gene duplication? It is well established that spiralians, including oysters, have undergone extensive lineage-specific duplication of transcription factors such as homeobox genes, and many structural shell-associated proteins may similarly have been duplicated. This could be relevant to interpreting both the phylostratigraphic results and the expansion of secretory gene families.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript examines the evolution of molluscan shells using single-cell analyses of the adult mantle of Crassostrea gigas and compares these data with previous datasets from embryonic and larval stages of this species and other spiralians. The authors provide support for a scenario in which secretory cells are broadly conserved across spiralians, and the incorporation of lineage-restricted genes contributes to the evolution of molluscan shells.

      Strengths:

      High-quality datasets for mantle tissue in Crassostrea gigas and thorough comparisons with existing datasets for this species and other spiralians. Balanced discussion.

      Weaknesses:

      No major weaknesses. The analyses follow fairly standard approaches in the field that have been previously applied and developed in similar systems.

      We thank the reviewer for the positive evaluation of our work. We are encouraged that the reviewer finds our conclusions balanced and the analyses appropriate. Although no major concerns were raised, we will incorporate clarifications and improvements prompted by the other reviewers to further strengthen the manuscript.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) Validation of cell types

      Cell type identities are not convincingly validated. Although the authors cite previous studies (l. 92), the referenced marker genes are largely not used, and the cited works do not provide sufficient spatial validation. Without in situ data, the inferred locations of cell types (e.g. Figure 2A) are not supported. Spatial validation of marker genes (e.g. via HCR) is essential, particularly for a study addressing shell field evolution. In addition, the gastrula dataset is not meaningfully analyzed, and its inclusion remains unclear.

      We thank the reviewer for this important comment regarding cell type validation. In the previous version of the manuscript, we provided a detailed compilation of referenced marker genes from previous studies in Supplementary File 2. It is possible that, due to an incorrect or unclear reference in the main text, this information was not readily accessible. We will correct and clarify these citations in the revised manuscript to ensure that these resources are clearly presented.

      We agree that spatial validation would provide important support for cell type identities. In the revised version, we will strengthen this aspect by selecting more specific marker genes for each SEC cluster and performing fluorescence in situ hybridisation (FISH) to validate their spatial localization.

      Regarding the gastrula dataset, our original intention was to investigate the developmental transition of shell gland-related cell populations from gastrula to trochophore stages. However, following the reviewer’s suggestion and considering the limited interpretability of the gastrula dataset in its current form, we agree that its inclusion does not substantially strengthen the study. We therefore plan to remove the gastrula dataset from the revised manuscript, and instead focus on the trochophore stage as a representative developmental stage for larval shell formation, enabling a clearer comparison between larval and adult shell-forming cell populations. We note that this change does not affect the main conclusions of the study. In addition, we will curate a refined set of experimentally supported marker genes, and provide an updated supplementary table summarizing detailed information, including cell type annotations, literature sources, and experimental validation methods.

      (2) Robustness of cell type classification 

      Several proposed cell types may not represent distinct entities (not individuated) but rather reflect over-clustering. Marker genes are often not specific and are shared across clusters (e.g. Sec1/Sec2), making it difficult to distinguish cell types reliably.

      In the revised manuscript, we will refine marker gene selection by prioritizing genes with higher specificity and stronger discriminatory power to improve the robustness of cell type identification. To further support cell identity assignment, we will select representative marker genes for SEC clusters and perform FISH to validate their spatial localization. These revisions will lead to a more robust and conservative interpretation of cell populations.

      (3) Comparative analysis of secretory cells

      The comparative framework is not sufficiently supported. Secretory cells are highly diverse, and without proper validation, their comparison across taxa is not meaningful. The transcription factor analysis is limited, as only a few genes are shared and many are inconsistently expressed (Figure 3E). The conclusion of a conserved regulatory program across spiralians is therefore overstated.

      We agree that secretory cell types are highly diverse across spiralians and that cross-species comparisons require careful interpretation. In the revised manuscript, we will adopt a more cautious framework, highlight partial conservation of regulatory program alongside functional convergence in secretory processes. We also will strengthen the comparative framework by integrating functional annotations, which may provide complementary support beyond individual gene overlaps. Importantly, we will improve the reliability of oyster SEC annotations through FISH-based spatial validation, thereby increasing confidence in cross-species comparisons. These revisions will provide a more balanced and biologically grounded interpretation of secretory cell evolution across spiralians.

      (4) Clarity and interpretation of results

      Results are at times difficult to follow and remain superficial. Marker genes are insufficiently annotated (especially for Crassostrea), and comparisons across taxa lack functional interpretation. Unvalidated and heterogeneous cell types are grouped together, and transcriptional similarities are overinterpreted. Overall, key conclusions are not adequately supported by the presented data.

      In the revised manuscript, we will re-evaluate marker gene annotations to ensure support from existing experimental evidence. For SEC populations, we will validate representative markers using FISH. We will also expand the functional annotation of marker genes and strengthen cross-species comparisons. In addition, we will substantially revise the Results and Discussion sections to improve clarity and depth, reduce overinterpretation of transcriptional similarities, and ensure that all conclusions are more tightly aligned with the strength of the supporting evidence.

      Reviewer #3 (Public review):

      Weaknesses:

      (1) My main concern is that the authors rely primarily on previous studies for the experimental and functional characterisation of the identified cell types. The cited papers (Piovani, 2023 and de la Forest Divonne et al., 2025) deal with distinct stages or tissues (larvae and hemocytes, respectively), which limits their direct relevance. The authors also cite other papers for in situ expression data; it would be helpful to summarise somewhere (e.g. in a table) which genes have been experimentally characterised and what their expression domains are, or alternatively to provide HCR or in situ staining on the mantle. For instance, what is the rationale for the claim that proliferative cells give rise to the mantle? The trajectory inference approach used (Monocle) would likely yield a similar result regardless of the reference cell type, so additional justification is needed.

      We agree that our reliance on previous studies for functional and experimental characterization requires clearer justification and integration. In the revised manuscript, we will compile a new supplementary table summarizing marker genes with available experimental validation, including their associated cell types, literature sources, and experimental methods. For SEC populations, we will select representative marker genes and perform FISH to validate their spatial localization, thereby providing independent support for cell identity.

      Regarding trajectory inference, we agree that methods such as Monocle are sensitive to assumptions. We will clarify the rationale for root cell selection, test alternative root assignments to assess robustness, and revise our interpretation to avoid strong lineage claims. Rather than stating that proliferative cells give rise to mantle cells, we will describe the observed trajectory as being consistent with a potential developmental relationship, while acknowledging that this does not constitute direct evidence of lineage progression.

      (2) More broadly, I find that the functional properties of the identified cell types and their relationship to the expressed genes deserve more detailed discussion. For example, at L100, several genes are mentioned, but their functional roles are not discussed. Similarly, the basis for annotating the proliferative cells is not explained. How was gene orthology assessed? Throughout the manuscript, vertebrate-style gene names are used without explicitly establishing orthology status in oyster, which should be addressed.

      We thank the reviewer for this important comment. In the revised manuscript, we will expand the functional interpretation of key genes by incorporating available literature and, where possible, functional annotations. We will also clarify the basis for cell type annotation and explicitly describe the criteria used, including for proliferative cell populations (e.g. cell proliferation-associated markers).

      Regarding gene annotation, gene names in oyster were assigned based on sequence similarity searches against the eggNOG database. In the revised manuscript, we will provide a comprehensive supplementary table linking gene IDs to their annotations, along with the corresponding database sources. In addition, we will clearly describe how orthology relationships were assessed, including the methods and criteria used (e.g. sequence similarity searches and orthology databases). Throughout the revised manuscript, we will ensure that the use of vertebrate-style gene names is accompanied by appropriate annotation information and does not imply unsupported one-to-one orthology relationships.

      (3) More detail is needed on the methods and quality control for the single-cell data. The authors should clarify that the platform used (BMKMANU) is a droplet-based technology comparable in principle to Drop-seq. BMKMANU is not widely used in the field. How does it compare to 10x Genomics in terms of sensitivity and cell recovery? The authors appear to use the 10x Chromium cellranger pipeline for data analysis, which suggests compatibility, but this should be stated explicitly. Additionally, no information is provided on the number of sequencing runs or biological replicates, nor on how reproducible the results are across samples.

      In the revised manuscript, we will expand the Methods section to provide a clearer and more detailed description of the experimental and analytical procedures. BMKMANU is a droplet-based single-cell RNA-seq platform, conceptually comparable to Drop-seq and similar in principle to 10x Chromium. We will also explicitly state that the data generated are compatible with the Cell Ranger pipeline, which was used for downstream processing and analysis. Although BMKMANU is less widely used than 10x Genomics platforms, it has been successfully applied in several recent studies (e.g. Li et al., 2024: https://doi.org/10.1007/s11427-023-2548-3; Li et al., 2025: https://doi.org/10.1038/s41559-025-02642-6; Wei et al., 2024: https://doi.org/10.1038/s41467-024-46780-0), demonstrating its applicability for single-cell transcriptomic analyses across different biological systems. Regarding platform performance, based on technical information provided by the manufacturer, BMKMANU shows comparable sensitivity and cell capture efficiency to 10x Genomics platforms (http://www.biomarker.com.cn/zhizao/dg1000danxibao). In this study, the mantle sample was obtained from a single individual oyster and processed in a single sequencing run, without batch effects introduced by multiple runs. We will clearly state this in the revised manuscript. In addition, we will provide detailed quality control metrics, including the number of cells retained, gene detection rates, and filtering criteria.

      (4) A limitation of the phylostratigraphic analysis is that it is restricted to mantle tissue, making it difficult to place the results in a whole-organism context. How do the age profiles of mantle-expressed genes compare to those of more evolutionarily conserved tissues, such as the nervous system? I appreciate the methodological and experimental constraints, but this is a genuine limitation of the study. The authors could at least discuss it explicitly, and ideally consider generating a broader single-cell atlas of the oyster to provide this comparative baseline.

      We agree that restricting the phylostratigraphic analysis to mantle tissue represents a limitation when attempting to place our findings in a whole-organism evolutionary context. In the revised manuscript, we will explicitly acknowledge this limitation and expand the Discussion to address how gene age profiles in mantle tissue may differ from those in more evolutionarily conserved tissues. In particular, we will clarify that the enrichment of younger, lineage-specific genes observed in shell-forming cells may reflect tissue-specific functional specialization, and therefore should not be directly generalized to other cell types.

      We acknowledge that a broader single-cell atlas spanning multiple tissues would provide an important comparative baseline for interpreting gene age patterns across the organism. While generating such a dataset is beyond the scope of the present study, we will highlight this as an important direction for future research.

      (5) Have the authors considered the potential importance of lineage-specific gene duplication? It is well established that spiralians, including oysters, have undergone extensive lineage-specific duplication of transcription factors such as homeobox genes, and many structural shell-associated proteins may similarly have been duplicated. This could be relevant to interpreting both the phylostratigraphic results and the expansion of secretory gene families.

      We thank the reviewer for this insightful suggestion. Lineage-specific gene duplication is likely to play an important role in shaping both transcription factor repertoires and shell-associated gene families in spiralians, including oysters. In the revised manuscript, we will incorporate a discussion of lineage-specific duplication, particularly in relation to transcription factors and biomineralization-related proteins. We will also, where feasible, explore its potential contribution to our observations and highlight how such duplications may drive the expansion and diversification of secretory gene families.

    1. eLife Assessment

      This study presents a valuable perspective on platelet-mediated fibrin compaction, proposing that fibrin fibers undergo "winding" or coiling, an intriguing framework with potential implications for thrombosis and clot mechanics. However, the evidence supporting an active platelet-driven winding mechanism remains incomplete, relying largely on correlative observations without direct or quantitative validation of the proposed dynamics. Overall, the work is thought-provoking and of clear interest to the field, but stronger mechanistic evidence will be required to substantiate the central claims.

    2. Reviewer #1 (Public review):

      This paper reports a previously unrecognized mechanism by which platelets compact fibrin fibers during clot retraction. Rather than simply pulling on fibers, the authors propose that platelets generate swirling motions that wind and loop fibrin into dense structures.

      While the results are intriguing, the underlying physical mechanism remains unexplained. In particular, it is unclear how platelets generate swirling motion capable of inducing fibrin coiling, especially when suspended in 3d fibrin mesh. This raises concerns about the conclusions. Also, does fibrin have inherent chirality or structural asymmetry that could promote coiling independently of platelet activity? Furthermore, platelet retraction typically involves platelet aggregation rather than isolated cells, and it is unclear how fibrin coiling would proceed in clustered platelets.

    3. Reviewer #2 (Public review):<br /> <br /> Summary:

      Grichine et al. investigate platelet-mediated fibrin compaction using human donor platelets and propose a novel mechanistic model in which platelets generate contractile forces and wind fibrin fibers into compact coiled structures. Using a combination of 2D spread assays, 3D clot imaging via expansion microscopy, live-cell imaging, and computational modelling, the authors present evidence of cage-like fibrin architectures, coiled-fibre morphologies, and platelet-centred "rosette" structures present during fibre compaction. They further suggest that actomyosin-driven cytoskeletal dynamics, potentially involving rotational or swirling motion, underlie this proposed winding mechanism, analogous to DNA looping and compaction. The study addresses an important and longstanding question in thrombosis and hemostasis and offers a conceptually novel perspective on clot compaction.

      Strengths:

      The integration of multiple imaging modalities is a notable strength of this paper. In particular, the 2D fiber-retraction assay provides a useful model for understanding the spatio-temporal dynamics of platelet-mediated fibrin compaction, which can be applied to other systems and may yield detailed mechanistic insights into biological processes. The live-imaging approaches are particularly well executed and offer valuable dynamic insight.

      Weaknesses:

      The primary weakness of this paper lies in its descriptive nature and its reliance on correlative rather than causal evidence. Several interpretations are not uniquely supported by the data presented. For example, the categorisation of fibrin accumulation in 2D assays as "fiber winding" and "fibre compaction" remains descriptive without establishing winding as a mechanism. Alternative mechanisms, such as circular bundling, stacked fibers under tension, or fibrin crosslinking-induced aggregation, are neither excluded nor investigated. Although the authors present compelling live imaging, establishing winding as a dynamic phenotype would require quantitative analyses, such as measuring angular velocities and coiling rates. The use of a second fluorophore-labelled fibrin population could further strengthen evidence for rotational dynamics. Similarly, the inference of rotational contractility or actomyosin "swirling", based on chiral actin organisation and blebbistatin treatment, is not sufficiently supported to conclude that platelets actively wind or loop fibrin fibers. The mathematical model, while complementary and well-constructed, relies on multiple assumptions and lacks predictive validation.

      Appraisal:

      While the authors successfully document intriguing fibrin architectures and provide a compelling descriptive framework, they do not fully demonstrate a mechanistic model of active fibrin winding by platelets. The conclusions regarding platelet-driven winding and rotational dynamics are not sufficiently supported by direct or quantitative evidence. To substantiate these claims, the study would benefit from experiments that directly link platelet dynamics to fibrin organisation, including coordinated measurements of platelet motion and fibre rearrangement. As it stands, the results are suggestive but do not definitively support the proposed mechanism.

      Discussion and Impact:

      Despite these limitations, the study addresses an important question in thrombosis and hemostasis and introduces a potentially impactful conceptual framework for understanding clot compaction. The imaging approaches and datasets presented will be valuable to the community, particularly for researchers interested in platelet mechanics and fibrin organisation. However, the overall impact will depend on whether the proposed mechanism can be more rigorously validated. In its current form, the study presents an interesting and thought-provoking model, but would benefit from either stronger experimental support for the proposed mechanisms or a more cautious interpretation of the findings.

    4. Reviewer #3 (Public review):

      Summary:

      This work aims to understand the mechanisms that platelets use to interact with and compact fibrin fibers during clot formation. This is an important process during wound healing, and recent work has demonstrated that platelets play a critical role in generating the force required to drive the accumulation of fibrin. The authors argue that current models are insufficient to account for the observed reduction in clot volume and propose that platelets actively 'wind up' these fibers by undergoing myosin-dependent rotation. While interesting, the experiments performed by the authors do not directly test this mechanism, and further evidence is required to support their claims.

      Weaknesses:

      (1) The motivation to switch from the system used in Figures 1 and 2 to the '2D fiber-retraction assay' is not clear. While the authors state that this system has 'reduced complexity', the differences between these assays appear to disrupt the 'cage-like' organization of fibrin around platelets shown in Figures 1 and 2 (compare images in Figure 2 with those in Figure 4). An in-depth comparison of two methods is needed to support the conclusions from the 2D system. Furthermore, the change in plasma volume (Figure 2 vs Figure 7) should also be tested - the authors state that this increases fibrin fiber formation, but this is not quantified or demonstrated in the figures. Notably, this appears to change the morphology of the fibrin fibers shown (comparing Figure 2 and Figure 7).

      (2) It is unclear how the classification of platelets as 'fiber-winding' versus 'fiber compaction' differs in Figure 2. The criteria used for these classifications should be stated. Further, it seems premature to characterize fibers as wound without having established this earlier in the manuscript.

      (3) Is the 'gearwheel' different from the 'cage' of fibrin fibers? They appear similar, but it is difficult to distinguish between them with only qualitative descriptions of these phenotypes.

      (4) The quantification of platelet extensions in Figure 9 is confusing. While those in 9A are clear, those in 9B are not. For instance, what is the difference between #7 and #8 in the middle panel of 9B? It does not seem like #8 is labeling an extension.

      (5) It is unclear what the modeling accomplishes, as there is no comparison between the results of these simulations and their experiments.

      (6) The data presented in Figure 12 provides the most direct support for their mechanism, but falls short of directly testing their claims. These experiments should be repeated to include blebbistatin to test the contribution of myosin and include quantitative rather than qualitative comparisons of these experiments.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This paper reports a previously unrecognized mechanism by which platelets compact fibrin fibers during clot retraction. Rather than simply pulling on fibers, the authors propose that platelets generate swirling motions that wind and loop fibrin into dense structures.

      While the results are intriguing, the underlying physical mechanism remains unexplained. In particular, it is unclear how platelets generate swirling motion capable of inducing fibrin coiling, especially when suspended in 3d fibrin mesh. This raises concerns about the conclusions.

      We explained our hypothesis concerning the physical mechanism of how platelets may generate the swirling motion, lines 200-215 and in the discussion under "ideas and speculations". We will provide, however, a more detailed explanation about this process in the revised version.

      The reviewer is right, it is difficult to imagine how platelets in a 3D fibrin mesh can accumulate fibers at the base of their extensions to form a cage-like fiber organisation around the center of the platelets. We therefore developed the 2D fiber-retraction assay, which we believe provides important insights for the coiled fiber accumulations above spread platelets in the 2D situation but also provides a framework for interpreting similar processes that may occur within a 3D clot. In response, we will place greater emphasis on clarifying and strengthening the comparison between the potential mechanistic aspects in the 2D and 3D assays, in order to better support our proposed model.

      Also, does fibrin have inherent chirality or structural asymmetry that could promote coiling independently of platelet activity?

      Yes, double stranded fibrin protofibrils have a helical twist [1]. Furthermore, a clot formed in the absence of platelets and other cellular components shows intrinsic tensile forces [2]. However, we show that inhibition of actomyosin actions prevents fibrin fiber accumulation in the 2D fiber-retraction assay providing evidence that platelet actions are necessary to observe the coiled fibers above spread platelets.

      Furthermore, platelet retraction typically involves platelet aggregation rather than isolated cells, and it is unclear how fibrin coiling would proceed in clustered platelets.

      Under the in vitro fiber retraction conditions used in our study (constrained or unconstrained clots or even in the 2D assay) individual platelets are homogenously distributed within the forming clot or on the coverslip. Therefore, there are no big platelet aggregates or clusters of platelets under our experimental conditions and the results can only demonstrate how individual platelets act on the fibrin fibers. We will emphasize this point in the revised version.

      Reviewer #2 (Public review):

      Summary:

      Grichine et al. investigate platelet-mediated fibrin compaction using human donor platelets and propose a novel mechanistic model in which platelets generate contractile forces and wind fibrin fibers into compact coiled structures. Using a combination of 2D spread assays, 3D clot imaging via expansion microscopy, live-cell imaging, and computational modelling, the authors present evidence of cage-like fibrin architectures, coiled-fibre morphologies, and platelet-centred "rosette" structures present during fibre compaction. They further suggest that actomyosin-driven cytoskeletal dynamics, potentially involving rotational or swirling motion, underlie this proposed winding mechanism, analogous to DNA looping and compaction. The study addresses an important and longstanding question in thrombosis and hemostasis and offers a conceptually novel perspective on clot compaction.

      Strengths:

      The integration of multiple imaging modalities is a notable strength of this paper. In particular, the 2D fiber-retraction assay provides a useful model for understanding the spatio-temporal dynamics of platelet-mediated fibrin compaction, which can be applied to other systems and may yield detailed mechanistic insights into biological processes. The live-imaging approaches are particularly well executed and offer valuable dynamic insight.

      Weaknesses:

      The primary weakness of this paper lies in its descriptive nature and its reliance on correlative rather than causal evidence. Several interpretations are not uniquely supported by the data presented. For example, the categorisation of fibrin accumulation in 2D assays as "fiber winding" and "fibre compaction" remains descriptive without establishing winding as a mechanism.

      In the revised version, we will avoid the terms fiber winding/compaction when introducing the 2D fiber-retraction assay (figure 3) to better align with the level of evidence, since coiled fibers cannot be distinguished in this figure. However, coiled fibers above spread platelets are clearly visible in figure 4 and 8 and dynamic fiber rotations or winding are observed in figure 12 and video 9. These observations will be presented more cautiously, as indicative rather than definitive evidence of a winding mechanism.

      Alternative mechanisms, such as circular bundling, stacked fibers under tension, or fibrin crosslinking-induced aggregation, are neither excluded nor investigated.

      For fibrin fiber bundling, staggered or crosslinked protofilaments no platelet actions are necessary as described previously [2, 3] . Since we observed a clear difference between +/- blebbistatin conditions in the 2D fiber-retraction assay, the fiber compaction we observe depends on platelet actions. Consequently, we consider these alternative mechanisms unlikely based on our data. This will be stated explicitly in the results section.

      Although the authors present compelling live imaging, establishing winding as a dynamic phenotype would require quantitative analyses, such as measuring angular velocities and coiling rates.

      We will incorporate quantitative measurements to complement the observations obtained from live imaging. It is important to note, however, that angular velocities and coiling rates are likely influenced by the number of fiber–fiber contacts present at the time coiling occurs. Specifically, an increased number of contacts is expected to elevate tension within the network, thereby modulating the forces generated by platelets and, consequently, affecting both velocity and coiling dynamics.

      The use of a second fluorophore-labelled fibrin population could further strengthen evidence for rotational dynamics.

      These live videos are quite difficult to acquire because of the following reasons:

      Small platelet size

      Heterogeneity of platelets within the population (10 d half-life, old platelets may not be able to compact fibers efficiently).

      The speed of the process and the time needed to adjust parameters for image acquisition, necessitates an arbitrary choice of the acquisition window and only one acquisition (90 min) per sample preparation is possible.

      Furthermore, the laser induced illumination can perturb the observed processes. We therefore use high-spatial-resolution 3D confocal time-lapse imaging, performed in photon-counting mode with very low laser excitation.

      For these reasons, the use of additional markers would be technically challenging and could perturb the delicate equilibrium and dynamics of the process under investigation.

      Similarly, the inference of rotational contractility or actomyosin "swirling", based on chiral actin organisation and blebbistatin treatment, is not sufficiently supported to conclude that platelets actively wind or loop fibrin fibers.

      Importantly, in the 2D fiber-retraction assay, we do not propose that the rotational actomyosin activity leads to a contractility of the platelets which would allow fiber retraction. Rather, we suggest that cytoskeletal actomyosin swirling (as demonstrated for nucleated cells by Bershadsky's team) can induce rotational dragging of extracellular bound fibrin fibers around the pseudonucleus of spread platelets thereby promoting accumulation of fibrin fibers. Consistent with this interpretation, inhibition of myosin by blebbistatin prevents the accumulation of fibrin fibers above spread platelets in the 2D fiber-retraction assay (Fig. 3).

      The mathematical model, while complementary and well-constructed, relies on multiple assumptions and lacks predictive validation.

      We thank the reviewer for this insightful comment and acknowledge that the proposed model relies on several important assumptions. In our view, the most significant assumption is that integrin molecules undergo rotational downstream motion as a consequence of their coupling to the swirling cytoskeleton. To assess the necessity and impact of these assumptions, we will perform additional calculations and include the results in the Supplementary Information. These analyses will also provide further validation of the proposed model and underlying mechanism. At the same time, it is important to emphasize that the primary purpose of the model was to examine whether the hypothetical swirling dynamics of the cytoskeleton, together with the associated receptors, could in principle reproduce the experimentally observed fibrin organization.

      Appraisal:

      While the authors successfully document intriguing fibrin architectures and provide a compelling descriptive framework, they do not fully demonstrate a mechanistic model of active fibrin winding by platelets. The conclusions regarding platelet-driven winding and rotational dynamics are not sufficiently supported by direct or quantitative evidence. To substantiate these claims, the study would benefit from experiments that directly link platelet dynamics to fibrin organisation, including coordinated measurements of platelet motion and fibre rearrangement. As it stands, the results are suggestive but do not definitively support the proposed mechanism.

      Discussion and Impact:

      Despite these limitations, the study addresses an important question in thrombosis and hemostasis and introduces a potentially impactful conceptual framework for understanding clot compaction. The imaging approaches and datasets presented will be valuable to the community, particularly for researchers interested in platelet mechanics and fibrin organisation. However, the overall impact will depend on whether the proposed mechanism can be more rigorously validated. In its current form, the study presents an interesting and thought-provoking model, but would benefit from either stronger experimental support for the proposed mechanisms or a more cautious interpretation of the findings.

      We agree that the proposed mechanism requires further validation. In the revised manuscript, we will therefore present a more cautious and explicitly hypothesis-driven interpretation of the mechanism. We hope that the publication of our observations will be of interest to researchers in the field of thrombosis and clot mechanics who possess the specialized tools and expertise necessary to rigorously evaluate and either substantiate or refute the proposed mechanistic model.

      Reviewer #3 (Public review):

      Summary:

      This work aims to understand the mechanisms that platelets use to interact with and compact fibrin fibers during clot formation. This is an important process during wound healing, and recent work has demonstrated that platelets play a critical role in generating the force required to drive the accumulation of fibrin. The authors argue that current models are insufficient to account for the observed reduction in clot volume and propose that platelets actively 'wind up' these fibers by undergoing myosin-dependent rotation. While interesting, the experiments performed by the authors do not directly test this mechanism, and further evidence is required to support their claims.

      Weaknesses:

      (1) The motivation to switch from the system used in Figures 1 and 2 to the '2D fiber-retraction assay' is not clear. While the authors state that this system has 'reduced complexity', the differences between these assays appear to disrupt the 'cage-like' organization of fibrin around platelets shown in Figures 1 and 2 (compare images in Figure 2 with those in Figure 4). An in-depth comparison of two methods is needed to support the conclusions from the 2D system.

      We agree that the cage-like fibrin organization around platelets is disrupted in the 2D fiber-retraction assay when platelets are completely spread on the coverslip before they have encountered fibrin fibers (Fig. 4). However, some platelets form the same number of extensions as platelets in a 3D clot (Fig. 9 A, B) and are not completely spread on the glass surface. For these platelets a cage-like fibrin organisation is retained under the 2D conditions (Fig. 5 and 6). However, the fiber density at the base of the bulbs is higher in the 2D assay than under the constrained 3D clot retraction conditions (Fig. 1C and Fig. 2), probably because in the 2D condition the fibers are less constrained and readily available for compaction.

      Furthermore, the change in plasma volume (Figure 2 vs Figure 7) should also be tested - the authors state that this increases fibrin fiber formation, but this is not quantified or demonstrated in the figures. Notably, this appears to change the morphology of the fibrin fibers shown (comparing Figure 2 and Figure 7).

      We thank the reviewer for raising this point. We would like to clarify that Figure 2 and Figure 7 correspond to two distinct experimental setups: the constrained clot retraction assay (Figure 2) and the 2D fiber-retraction assay (Figure 7). As such, they are not directly comparable. We understand, however, that the reviewer is likely referring to the apparent differences between Figures 3–6 (lower plasma volume, higher fiber density) and Figures 7–8 (higher plasma volume, lower apparent fiber density).

      The reduced number of visible fibers in the latter condition is not solely a consequence of plasma volume per se, but rather results from the formation of a labile fibrin gel at higher plasma concentrations, which is lost during the fixation and aspiration steps. This effect was initially observed across samples from two donors with differing plasma fibrinogen levels. In one case, an unusually low fibrinogen concentration allowed the addition of higher plasma volumes without inducing gel formation. In contrast, in the other sample, a more typical fibrinogen level resulted in gel formation under the same conditions.

      Importantly, we performed all experiments using matched donor plasma and platelets. As a result, the precise fibrinogen concentration could not be determined prior to experimentation. Nonetheless, post hoc measurements confirmed that fibrinogen levels in most donor samples fell within the normal physiological range, which allowed us to always use the same plasma volumes for low and high plasma concentrations (4ul/ml PBS and 7 ul/ml PBS, respectively) except for one donor as mentioned above.

      (2) It is unclear how the classification of platelets as 'fiber-winding' versus 'fiber compaction' differs in Figure 2. The criteria used for these classifications should be stated. Further, it seems premature to characterize fibers as wound without having established this earlier in the manuscript.

      The reviewer probably refers to figure 3 and he is right; it is premature to mention fiber winding at this stage of the results section (see our response to reviewer #2). In the revised version, we will therefore present the criteria used to classify the different degrees of fiber accumulations without referring to fiber winding.

      (3) Is the 'gearwheel' different from the 'cage' of fibrin fibers? They appear similar, but it is difficult to distinguish between them with only qualitative descriptions of these phenotypes.

      The "gearwheel" is observed for completely spread platelets in the 2D fiber-retraction assay and a figure illustrating our hypothetical speculations to compare the 2D gearwheel with the 3D clot situation is presented in the discussion under the "Ideas and Speculations" paragraph (Fig. 13). We will give a more comprehensive explanation in the revised version.

      (4) The quantification of platelet extensions in Figure 9 is confusing. While those in 9A are clear, those in 9B are not. For instance, what is the difference between #7 and #8 in the middle panel of 9B? It does not seem like #8 is labeling an extension.

      For the platelet shown in the middle panel of Figure 9B, the extensions cannot be clearly distinguished in the MIP (Maximum Intensity Projection) image because extension #8 is positioned above extension #7 and is therefore superimposed in the projection. However, the two extensions can be differentiated when examining the 3D image stack (Video 4). As indicated in the figure legend, the number of extensions was determined manually by scrolling through the z-stack image sequence. In the revised version, we will also define the abbreviation “MIP” as Maximum Intensity Projection.

      (5) It is unclear what the modeling accomplishes, as there is no comparison between the results of these simulations and their experiments.

      We thank the reviewer for this valuable concern. We chose not to combine the experimental fibrin organization and the modeling results within the same figure panel, as the resulting image would be too complex and difficult to interpret. However, we will provide a more detailed comparison between the experimental observations and the modeling results in the Results section. It is also important to emphasize that the comparison between the model and the experimental data was intended to be primarily qualitative rather than quantitative.

      (6) The data presented in Figure 12 provides the most direct support for their mechanism, but falls short of directly testing their claims. These experiments should be repeated to include blebbistatin to test the contribution of myosin and include quantitative rather than qualitative comparisons of these experiments.

      As mentioned already above, these live videos are quite tricky to acquire because of the following reasons:

      Small platelet size

      Heterogeneity of platelets within the population (10 d half-life, old platelets may not be able to compact fibers efficiently).

      The speed of the process and the time required to optimize imaging parameters, necessitate the selection of an arbitrary acquisition window. Consequently, only a single acquisition of approximately 90 min can be performed per sample preparation, with no guarantee that relevant platelet-fibrin interactions can be acquired in the acquisition window.

      Furthermore, after blood donation, the first sample is usually ready to be acquired around 3 pm, acquisition time 90 min. At least 10 successful acquisitions per condition would be required to ensure statistical robustness, but maximal 4 can be acquired per donor, because platelet samples start to deteriorate within twelve hours after blood donation.

      Taken together, the intrinsic heterogeneity of the platelet population, the low likelihood of capturing informative events, and the limited availability of suitable imaging resources at our institute render a robust and quantitative comparison between conditions with and without blebbistatin extremely challenging, if not impractical, within a reasonable timeframe.

    1. Author response:

      eLife Assessment

      This valuable study reports that the ALDH-abundant cells display stem cell properties and may play a key role in the endometrial epithelial development in the mouse. The data supporting the main conclusion are solid, although further improvements are needed to strengthen the conclusions. This work will be of great interest to reproductive biologists and biomedical researchers working on women's reproductive health.

      We thank the reviewers and editor for their critical reading and assessment of our manuscript. We carefully considered each of the points raised by the reviewers. In this document and in the edited manuscript and figures, we have carefully addressed each of the comments and requested modifications. In light of these changes, we expect that you will find that the manuscript has improved.

      We indicate our responses to the reviewers below in blue font and highlight the changes in the manuscript using the line numbers corresponding to the tracked version of the revised document.

      Public Reviews:

      Reviewer #1 (Public review):

      The manuscript by Tang et al. characterizes the expression dynamics and functional roles of aldehyde dehydrogenase 1 activity in uterine physiology. Using a combination of in vivo lineage tracing and cell ablation coupled with organoid culture, the authors propose that Aldh1a1 lineage-marked cells contribute to uterine gland development and epithelial regeneration. The descriptive data will be of interest to reproductive biologists and clinicians and will build on established hypotheses in the field. The manuscript is well written and scientifically sound; however, several experimental limitations and interpretation caveats should be addressed.

      We thank the reviewer for their comments and expert assessment of our paper.

      (1) The methods surrounding the passage number and duration of culture following sorting prior to transcriptomic profiling should be clarified in the figure legends. Related to this, the representative images in Figures 1D and 1E do not appear consistent with the quantification presented in Figures 1F-H and should be reconciled.

      Thanks for this comment. We have now clarified this in the Figure 1 legend as follows,

      Lines 1026-1029: “Organoid formation assay performed immediately after luminal epithelial cell isolation and by plating equal numbers of viable ALDH<sup>LO</sup> (D) and ALDH<sup>HI</sup> (E) epithelial cells. ALDH<sup>LO</sup> and ALDH<sup>HI</sup> organoids were cultured for two weeks and passaged once prior to the organoid formation assays and transcriptomic analyses.”

      Regarding the second comment, we recognize that the images we showed may not have been the most representative of our quantification. As such, we replaced them with the organoid images below so that they better reflect the quantification outlined in Figure 1F-H.

      (2) The conclusion that ALDH1A1+ cells are enriched in populations with stem cell characteristics relies primarily on transcriptomic analysis. Protein-level co-localization should be performed to strengthen this claim.

      We thank the reviewer for this comment. Unfortunately, the antibodies for many of these stem cell markers (such as LGR5, AXIN2, and SUSD2) are not well-suited for immunostaining. Others that have been proposed in human and are amenable to immunostaining are not suitable markers for mouse endometrial stem cells (such as CDH2). We hope that by showing that ALDH1A1 is expressed in patterns that are similar to the previously published stem cell markers LGR5 and AXIN2 (i.e., throughout the epithelium in the developing uterus and subsequently enriched in the tips of the endometrial glands of adult mice), along with transcriptomic studies, we can demonstrate its utility as a marker for mouse endometrial stem cells.

      (3) The overlap of 19 genes between the data set here and AXIN2 HI data is presented as evidence of shared stemness identity, but no statistical assessment of this overlap is provided. A hypergeometric test should be performed to determine whether this overlap is greater than expected by chance.

      Thank you for this suggestion. We have performed a hypergeometric test and determined that the reported shared genes between the two datasets are greater than is expected by chance. We have updated the results section to state the following:

      Lines 133-141: "We determined that the overlap between ALDH<sup>HI</sup> and Axin2<sup>+</sup> stemness marker genes was significantly greater than expected by chance for both upregulated (21/346 genes, 1.81-fold enrichment, p = 0.0067) and downregulated (19/674 genes, 1.67-fold enrichment, p = 0.021) gene sets (hypergeometric test, universe = 23,182 genes)."

      (4) The impact of tamoxifen injection on Aldh1a1 expression should be characterized in the neonatal uterus, as tamoxifen itself has known estrogenic activity that could confound interpretation of the lineage tracing results at early postnatal timepoints.

      Although we took measures to control for this possibility by using multiple time-points and models to trace the impact of Aldh1a1<sup>+</sup> cells in development and adulthood, we recognize the importance of this comment and acknowledge that this is a limitation in the design of our study. We have included the following text to the Discussion acknowledging this point:

      Lines 434-442: “Given the well-documented impacts of tamoxifen for lineage tracing studies, it is imperative to use doses of tamoxifen that will minimize estrogenic impacts and result in off-target effects (Rios et al., 2016). This often requires administration at doses that will achieve maximal recombination of the desired gene, while ensuring that the potential deleterious impacts of tamoxifen are minimized (Chen et al., 2023; Pimeisl et al., 2013). The cre/ERT2 tamoxifen inducible model is widely used to study uterine biology where it serves as a useful tool to interrogate the spatiotemporal impact of key genes, either through inactivation or for lineage tracing. Despite its widely documented utility across many tissue types and developmental timepoints, the use of tamoxifen and its impacts on the endometrium remain a limitation of our study, which we tried to address by implementing multiple timepoints, doses, and orthogonal assays in our experimental design.”

      (4b) Related to this, while low-dose tamoxifen is shown to label individual cells within 24 hours of injection, the translation dynamics of the label following Cre-mediated recombination can require up to 72 hours. The presence of only a few labeled clones at PND8 but multiple separate clones per cross-section at later timepoints warrants discussion and may reflect labeling kinetics rather than clonal expansion.

      The reviewer raises an important point. We agree that the 72hr-translation kinetics of the cre-mediated recombination is a legitimate consideration for interpreting our data and we have added the text below to the Discussion section acknowledging this point.

      We have addressed this by adding the following text to the discussion:

      Lines 418-423: We hypothesized that the singly labeled cells observed from one day tracing experiments expanded in a clonal fashion during the various timepoints we measured. We note that the translation kinetics of the labeled cells following cre-mediated recombination may contribute to the limited labeling observed at PND8/PND15 and there is a potential for delayed labeling of cells between 24 and 72 hours of tamoxifen administration. However, the continuous increase in labeled cells at the subsequent timepoints favors our interpretation of clonal expansion as the primary explanation.

      (5) It would strengthen the in vivo ablation data to validate the degree of cell death following diphtheria toxin treatment directly. It is possible that a general decrease in cell number rather than specific loss of a stem cell population is responsible for the observed reduction in gland number and FOXA2 expression (Tongtong et al 2017).

      We agree that this is an important control to incorporate into our experimental design. To rule out this possibility, we performed immunohistochemistry of cleaved caspase 3 in the uterine tissues of DTR<sup>flox/flox</sup> and DTR<sup>flox/flox</sup>;Aldh1a1<sup>cre/ERT2</sup> mice 4 days after administration of diphtheria toxin. The results indicate similar levels of cleaved caspase 3 detection in both genotypes, suggesting that the decrease in FOXA2+ cells is not due to non-specific cell death, but rather the result of ALDH1A1<sup>+</sup> cells. These data and the following text have been added to the manuscript:

      Lines 321-325: “We determined that the decreased in FOXA2<sup>+</sup> cells in the experimental mice was not the result of non-specific DT-mediated cell death, as similar levels of cleaved caspase 3-positive cells were detected in the DT-treated control ROSA26<sup>DTR/DTR</sup> and ROSA26<sup>DTR/DTR</sup>;Aldh1a1<sup>cre/ERT2/+</sup> mice 4 days post-diphtheria toxin administration (Figure S3G-H’).”

      (6) The lineage tracing data in the postpartum endometrium demonstrate that Aldh1a1-marked cells are present during regeneration, but it remains unclear whether these cells are preferentially activated or expanded in response to tissue injury. Coupling these studies with diphtheria toxin-mediated ablation during active regeneration would more directly test the proposed regenerative role of this population.

      This is a great point and one that we would be very interested in pursuing as follow-up studies in our future work. Regretfully, due to the long generation time and experimental procedures associated with these proposed studies, we are not able to include these experiments in the current manuscript. Thus, we have changed our wording and conclusions throughout the manuscript to be less definitive in terms of the role of Aldh1a1 in regeneration, since this will be the focus of future studies

      The contribution of stromal Aldh1a1 lineage-positive cells is underexplored in the discussion, given the lineage tracing data showing stromal labeling across multiple timepoints and its potential relevance to mesenchymal-to-epithelial transition.

      Thank you for the suggestion. We have now expanded this section in the Discussion to include the following:

      Lines 497-505: We also found ALDH1A1<sup>+</sup> stromal cells were more prevalent when tracing began in adult mice. Other studies have shown that mesenchymal cells contribute to endometrial regeneration in the postpartum phase or after induced menses through a process of MET (Cousins et al., 2014; Kirkwood et al., 2022; Li et al., 2025). Similarly, lineage tracing studies have shown that MET is an active process and contributes to epithelial cell regeneration in the post-partum phase (Huang et al., 2012; Patterson et al., 2013). Although this is an area of active investigation in the field, with some contradicting reports, it is plausible to hypothesize that endometrial tissue has the capacity to undergo wound-healing and regeneration via several mechanisms (Ang et al., 2023; Ghosh et al., 2020). The process of MET in wound healing is widely documented in other organs, such as the kidney, liver and lung, where MET is associated with depletion of the resident epithelial cell pool (Bi et al., 2012; Niayesh-Mehr et al., 2024; Zeisberg et al., 2005).

      Finally, the word 'control' may overstate the functional evidence presented. 'Contribute' may be more accurate given the partial and context-dependent nature of the phenotypes observed.

      We agree with the reviewer’s point that control may overstate the evidence that we provide in the manuscript. To reflect this, we have edited the manuscript title and text to address this suggestion.

      Reviewer #2 (Public review):

      Tang et al. investigated the contribution of Aldh1a1+ cells, as putative stem/progenitor cells, to endometrial development, maintenance during the estrous cycle, and postpartum repair in mouse models. They employed in vitro organoid formation and in vivo lineage tracing models coupled with RNA-seq to test the stem-ness of Aldh1a1+ cells. They found that mouse endometrial cells with high ALDH activity (using the ALDEFLUOR assay) formed more and larger organoids and were enriched for stem/progenitor cell gene signatures. Similar results were shown using endometrial cells from a human patient sample. Epithelial ALDH1A1 expression was shown to be hormonally regulated, becoming more restricted to the glands, a putative epithelial stem cell niche, under estrogen stimulation. Using lineage-tracing initiated postnatally/prepubertally, Aldh1a1+ epithelial cells were shown to expand, contributing to both the luminal and glandular epithelium into adulthood, whereas adult initiation of labeling showed expansion of stromal Aldh1a1+ cells but not epithelial. Postnatal ablation of single-labeled Aldh1a1+ epithelial cells resulted in impaired gland development. Lastly, Aldh1a1-lineage traced cells (adult labeled) were present during postpartum endometrial repair as were epithelial/mesenchymal transitional cells.

      This study addresses an important area of research in the field of endometrial stem/progenitor cell biology. The authors are commended for their use of multiple complementary methods, including lineage tracing, DTR-mediated cell ablation, organoid assays, and RNA-seq in mouse and human models to assess the stem-like nature of Aldh1a1+ cells. The data support the stem/progenitor phenotype of Aldh1a1+ epithelial cells during endometrial development; however, there are noted discrepancies between organoid formation assays and lineage tracing experiments regarding the stemness of Aldh1a1+ epithelial cells in adults. Specifically, organoids were generated from adult cells and demonstrated in vitro stem cell activity; however, in vivo lineage-tracing of adult cells either during the estrous cycle or postpartum repair does not show expansion of Aldh1a1+ cells, suggesting they do not have stem/progenitor activity. Additionally, the stem-ness of epithelial vs stromal Aldh1a1+ cells is confounded in the study because epithelial cells were not purified for organoid experiments, epithelial cells were not exclusively lineage-traced as stromal cells were also labeled, and mesenchymal-epithelial transition was suggested to occur during postpartum repair. The following specific comments are presented to detail these concerns:

      We thank the reviewer for their critical reading of our manuscript and constructive comments.

      (1) The statement in the brief summary, "...critical for lifelong endometrial regeneration," is not supported by the data provided.

      We have edited the brief summary to exclude this statement, it now reads as follows:

      Lines 4-5: “We uncover ALDH1A1<sup>+</sup> cells as a group of hormone sensitive stem cells contributing to endometrial development and regeneration.”

      (2) AlDH1A1 is not restricted to the endometrial epithelium, and epithelial cells were not purified by flow cytometry for experiments in Figure 1. Figure 2 clearly shows the presence of mesenchymal cells, even using the described method for enriching for epithelial cells. Therefore, contaminating mesenchymal cells with high ALDH activity may confound the experimental results in Figure 1, either through promoting epithelial cell growth or through MET. The authors should provide clear evidence of epithelial purity in organoid experiments or that mesenchymal cells are not contained in the ALDHhi population. These comments also apply to the human organoid experiments in Figure 7.

      We thank the reviewer for raising this important point. Our group has been using the enzymatic method to routinely separate epithelial from stromal cell populations from the mouse uterus (see references dating back to 2015, PMID 26721398, 28324064, 34099644). In these experiments we typically obtain >98% purity in the epithelial and stromal cell compartments, respectively. We can directly observe this purity in the immunofluorescence images shown below, where mouse endometrial epithelial cells and stromal cells were enzymatically separated and immunostained with E-cadherin and vimentin antibodies to detect epithelial and mesenchymal cells in both cell preparations. The images show very few contaminating epithelial and stromal cells in either cell preparation. We have observed similar results when preparing epithelial and stromal cell preparation from the human endometrium, where the epithelial cell organoids display high purity with ~100% epithelial cell expression when we perform immunostaining.

      Author response image 1.

      Purity of mouse endometrial epithelial cells obtained via enzymatic and mechanical dissociation. A-B) Shows the epithelial (A) and stromal (B) cells plated on glass coverslips and immunostained with an epithelial cell marker (cytokeratin 8, red), a stromal cell marker (vimentin, green), and DAPI.

      Author response image 2.

      Human endometrial epithelial organoids were fixed and immunostained with cytokeratin 8 (green) and DAPI. The images are typical for our epithelial cell cultures and demonstrate that all epithelial cells are CK8-positive.

      (3) Lines 186-187: Susd2 was increased in EpSC clusters, yet this is a mesenchymal stem/progenitor marker in humans. The authors should discuss the implications of this.

      We thank the reviewer for highlighting this. We have now included the following in our Discussion to address this point:

      Lines 528-533: Clustering with this population of EpSCs were Susd2<sup>+</sup> cells, which are well-characterized mesenchymal progenitors that are enriched in the perivascular regions of the human endometrium (Darzi et al., 2016; Khanmohammadi et al., 2021). The presence of Susd2<sup>+</sup> cells, while unexpected in an epithelial stem cell niche, could indicate the presence of a transitional mesenchymal or perivascular cell that is differentiating into epithelium. Evidence for both mesenchymal and Nestin2<sup>+</sup> pericytes have been recently described in the mouse endometrial epithelium (Kirkwood et al., 2022; Li et al., 2025).

      (4) In Figure 5, RFP+ epithelial cells should be quantified as in previous figures to substantiate the statement in lines 279-280, "At PPD5, the proportion of RFP+ epithelial cells had expanded relative to PPD1 and PPD3 (Figure 5E-E')." Especially because in the low mag images (C-E), RFP+ epithelial cells appear to be most abundant at PPD1 and decrease at PPD3 and PPD5, suggesting that they may not be involved in endometrial regeneration/repair (contradicting the interpretation in line 285). Further, if there is in fact a decrease over postpartum repair, then regeneration should be removed from the title of the manuscript. RFP+ stromal cells should also be quantified.

      We appreciate this reviewer’s comment and agree that as stated, the conclusion is not fully supported by the data. To address this comment, we have edited the results so that they clearly indicate the results and remove any ambiguity:

      As requested, we quantified the number of RFP+ stromal and epithelial cells during the postpartum phase and noted that RFP+ cells were prominent in the stromal compartment of the endometrium. While RFP+ epithelial were also observed during these timepoints, they were less abundant than RFP+ stromal cells. Because the number of RFP+ cells did not significantly change over the postpartum phases in neither the stromal nor epithelial compartment, we have modified our conclusion to state that ALDH1A1+ cells are transiently detected in the regenerating endometrium.

      Results:

      Lines 286-295: “By analyzing the uterine tissues near the placental detachment site, we observed that RFP positive cells were prominent in the endometrial stromal cells that were adjacent to the luminal epithelium (Figure 5C-C’, green arrows). RFP<sup>+</sup> cells were also observed in the stromal cells near the placental detachment sites at PPD1 and PPD3 (Figure 5D’-E’, red & blue arrows) and in limited luminal epithelial cells (Figure 5D”,E”). Quantification of RFP<sup>+</sup> cells throughout these postpartum phases indicated that stromal cells had more frequent ALDH1A1<sup>+</sup> stromal cells (360 ± 103, PPD1, n=3; 217 ± 107, PPD3, n=3; 254 ± 32, PPD5, n=4) than ALDH1A1<sup>+</sup> epithelial cells in the regenerating endometrium (65 ± 65, PPD1, n=3; 20 ± 10, PPD3, n=3; 114.25 ± 39, PPD5, n=4) (Figure S4).”

      Discussion:

      Lines 513-521: “We also noted that a majority of ALDH1A1<sup>+</sup> cells were localized to the active areas of endometrial regeneration near the placental detachment sites at PPD1 with a pronounced expression in the sub-epithelial stromal cells. As regeneration progressed, we continued to observe ALDH1A1<sup>+</sup> cells in the stromal compartment within the placental detachment sites at PPD3 and PPD5, with a progressive, but not statistically significant, increase in ALDH1A1<sup>+</sup> epithelial cells. Collectively, our data demonstrate that ALDH1A1<sup>+</sup> lineage cells participate in the restoration of endometrial architecture and functional compartments in the postpartum phase, even if their direct contribution is transient. Future detailed and mechanistic studies will be necessary to fully characterize their role in this process and their long-term consequence in postpartum regeneration.”

      (5) For Figure 7F, it should be clearly stated in the main text that the results are from one patient sample and the data presented are experimental replicates, so as not to be confused with biological replicates (the same for Supplementary Figure S4). Were B and G in Figure 7 also from one patient?

      Thanks for pointing this out. We have edited the figure legends in the main text and supplemental figures to indicate this.

      Lines 337-338: “…main figures show representative results from one patient sample performed in technical replicates, with additional patient samples included in the supplement…”

      (6) Lines 425-427: "Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed a complete restriction of ALDH1A1 to the glandular crypts." In Figure 2 S' ALDH1A1+ cells are visible in the LE (the staining is lighter than in the GE but looks real), contradicting this statement.

      This is an important distinction. We have now edited this part of the manuscript to state:

      Lines 459-462: “Ovariectomized mice treated with 90-day E2 pellets, on the other hand, showed enriched ALDH1A1 in the glandular crypts with weak luminal epithelial staining, while the ovariectomized controls had strong ALDH1A1 expression throughout the luminal and glandular epithelium.”

      (7) Lines 466-467: "In cycling mice, we found sporadic cells that expressed both stromal and epithelial markers in the ALDHA1+ cells." These data are not presented.

      We apologize for the confusion, this sentence has been removed from the discussion.

      (8) These data support the role of Aldh1a1+ cells in endometrial epithelial development, but conclusions about their role in repair/regeneration should be tempered as the data are much weaker here.

      We thank the reviewer for their overall assessment. To address this point, we have thoroughly edited the appropriate areas to temper the conclusions and ensure that they are strongly supported by our data. We have also edited the manuscript’s title to reflect this.

      Reviewer #3 (Public review):

      Summary:

      Tan et al demonstrated the importance of ALDH-high cells in the epithelial development in the mouse endometrium, and these cells displayed properties of stem cells.

      We thank the reviewer for their assessment of our manuscript.

      Strengths:

      The findings are solid, supported and validated through a combination of technical methods. I appreciated this combined use of mouse and human endometrial cells to strengthen the findings. Genomic results from a single-cell sequencing dataset were informative as they depicted the different stages of the estrus cycle during the regeneration process. Verification with immunostainings with various markers made it convincing for readers to visualize the cell's location, progression, and status at different timepoints. Utilizing human endometrial cells further demonstrated that the phenomenon observed in mice can be translated to humans.

      This work will greatly advance the understanding of endometrial regeneration for reproductive biologists.

      We thank the reviewer for their expert assessment and positive comments regarding our manuscript.

      Weaknesses:

      No major weaknesses were identified by this reviewer.

      Reference

      Ang, C.J., Skokan, T.D., and McKinley, K.L. (2023). Mechanisms of Regeneration and Fibrosis in the Endometrium. Annu Rev Cell Dev Biol 39, 197-221.

      Bi, W.R., Jin, C.X., Xu, G.T., and Yang, C.Q. (2012). Bone morphogenetic protein-7 regulates Snail signaling in carbon tetrachloride-induced fibrosis in the rat liver. Exp Ther Med 4, 1022-1026.

      Chen, M.Y., Zhao, F.L., Chu, W.L., Bai, M.R., and Zhang, D.M. (2023). A review of tamoxifen administration regimen optimization for Cre/loxp system in mouse bone study. Biomed Pharmacother 165, 115045. Cousins, F.L., Murray, A., Esnal, A., Gibson, D.A., Critchley, H.O., and Saunders, P.T. (2014). Evidence from a mouse model that epithelial cell migration and mesenchymal-epithelial transition contribute to rapid restoration of uterine tissue integrity during menstruation. PLoS One 9, e86378.

      Cousins, F.L., Pandoy, R., Jin, S., and Gargett, C.E. (2021). The Elusive Endometrial Epithelial Stem/Progenitor Cells. Front Cell Dev Biol 9, 640319.

      Darzi, S., Werkmeister, J.A., Deane, J.A., and Gargett, C.E. (2016). Identification and Characterization of Human Endometrial Mesenchymal Stem/Stromal Cells and Their Potential for Cellular Therapy. Stem Cells Transl Med 5, 1127-1132.

      Ghosh, A., Syed, S.M., Kumar, M., Carpenter, T.J., Teixeira, J.M., Houairia, N., Negi, S., and Tanwar, P.S. (2020). In Vivo Cell Fate Tracing Provides No Evidence for Mesenchymal to Epithelial Transition in Adult Fallopian Tube and Uterus. Cell Rep 31, 107631.

      Huang, C.C., Orvis, G.D., Wang, Y., and Behringer, R.R. (2012). Stromal-to-epithelial transition during postpartum endometrial regeneration. PLoS One 7, e44285.

      Khanmohammadi, M., Mukherjee, S., Darzi, S., Paul, K., Werkmeister, J.A., Cousins, F.L., and Gargett, C.E. (2021). Identification and characterisation of maternal perivascular SUSD2(+) placental mesenchymal stem/stromal cells. Cell Tissue Res 385, 803-815.

      Kirkwood, P.M., Gibson, D.A., Shaw, I., Dobie, R., Kelepouri, O., Henderson, N.C., and Saunders, P.T.K. (2022). Single-cell RNA sequencing and lineage tracing confirm mesenchyme to epithelial transformation (MET) contributes to repair of the endometrium at menstruation. Elife 11.

      Li, S.Y., Whiteside, S., Li, B., Sun, X., and DeFalco, T. (2025). Mesenchymal-to-epithelial transition of perivascular cells contributes to endometrial re-epithelialization. Nat Commun 16, 10174.

      Niayesh-Mehr, R., Kalantar, M., Bontempi, G., Montaldo, C., Ebrahimi, S., Allameh, A., Babaei, G., Seif, F., and Strippoli, R. (2024). The role of epithelial-mesenchymal transition in pulmonary fibrosis: lessons from idiopathic pulmonary fibrosis and COVID-19. Cell Commun Signal 22, 542.

      Patterson, A.L., Zhang, L., Arango, N.A., Teixeira, J., and Pru, J.K. (2013). Mesenchymal-to-epithelial transition contributes to endometrial regeneration following natural and artificial decidualization. Stem Cells Dev 22, 964-974.

      Pimeisl, I.M., Tanriver, Y., Daza, R.A., Vauti, F., Hevner, R.F., Arnold, H.H., and Arnold, S.J. (2013). Generation and characterization of a tamoxifen-inducible Eomes(CreER) mouse line. Genesis 51, 725-733.

      Rios, A.C., Fu, N.Y., Cursons, J., Lindeman, G.J., and Visvader, J.E. (2016). The complexities and caveats of lineage tracing in the mammary gland. Breast Cancer Res 18, 116.

      Seishima, R., Leung, C., Yada, S., Murad, K.B.A., Tan, L.T., Hajamohideen, A., Tan, S.H., Itoh, H., Murakami, K., Ishida, Y., et al. (2019). Neonatal Wnt-dependent Lgr5 positive stem cells are essential for uterine gland development. Nat Commun 10, 5378.

      Zeisberg, M., Shah, A.A., and Kalluri, R. (2005). Bone morphogenic protein-7 induces mesenchymal to epithelial transition in adult renal fibroblasts and facilitates regeneration of injured kidney. J Biol Chem 280, 8094-8100.

    1. eLife Assessment

      This study introduces the "Training Village," a valuable system for which solid evidence shows that it enables group-housed rodents to autonomously learn complex tasks while preserving natural social interactions. The platform is flexible, allowing animals to learn multiple tasks sequentially and supporting applications in continual learning. This approach is likely to be of broad interest to behavioral researchers using rodent models in systems and cognitive neuroscience.

    2. Reviewer #1 (Public review):

      Summary:

      The authors introduce the Training Village (TV), an open-source and modular system that allows group-housed rodents to live in enriched home cages while individually accessing a single shared operant box for automated cognitive training. The paper reported the animals' activity both in the operant box and in the home cages, which is novel.

      Strengths:

      A major strength of the work is that it moves beyond a proof-of-concept and demonstrates sustained box usage, long-term trial accumulation, and compatibility with different task designs.

      (1) The platform provided a technical contribution in rodent cognitive neuroscience: obtaining large amounts of behavioral data from complex tasks while reducing experimenter intervention and preserving social housing.

      (2) The authors demonstrate that the system can sustain prolonged task engagement (up to 12 months), maintain efficient use of a single operant box.

      (3) The manuscript opens interesting opportunities for studying behavior outside standard session-based training. Because animals self-initiate training while remaining in a group-housed setting, the platform has the potential to illuminate relationships among motivation, spontaneous activity, and task engagement that are hard to access in conventional paradigms.

      Weaknesses:

      (1) One area that would benefit from further clarification is the manuscript's core advance relative to prior automated group-housed training systems, particularly Mouse Academy (Qiao et al., 2018). The authors listed some advantages in the Discussion section; however, those were some minor engineering improvements, and what is more interesting is the scientific question or results that can be asked or obtained from this study. The current study clearly presents a functional and carefully documented platform, but it would help the reader if the authors more explicitly distinguished the present system from earlier related approaches, both in terms of system design and in terms of experimental validation.

      (2) At the system level, several of the claimed advantages could be supported more directly with quantitative data. For example, if the double-detection corridor and alarm system are important distinguishing features, it would be valuable to report measures such as detection accuracy, missed detections, co-entry failures, alarm frequency, and the degree of manual intervention required in practice. Similarly, the welfare-related arguments are plausible and important, but would be strengthened by more direct evidence, such as longitudinal body weight data, water intake, or comparison with group-housed no-task controls.

      (3) At the experimental level, the manuscript would also benefit from a more detailed characterization of training performance. Although three behavioral paradigms are presented, the data currently shown provide a stronger demonstration of feasibility than of training optimization. For a study focused on automated cognitive training, it would be critical to include more information on learning speed, progression across stages, success and failure rates, and variability across animals. Along the same lines, the comparison with manual training is a useful addition, but a broader benchmark including learning curves, time to criterion, and between-animal variability would make the practical value of the system easier to assess.

      (4) The authors claimed that they conducted 3 complex cognitive tasks (3AFC, 2AFC, 2AB) in their setup. However, those 3 tasks are quite basic for rodents and have been demonstrated in many studies, especially comparing tasks implemented in Yu et al., eLife 2025. Therefore, lowering this 'complex' statement is necessary.

      (5) The authors claimed that they have successfully implemented the so-called hybrid mode, but it is only briefly described and not supported by citations or data. Since this may be one of the most broadly applicable use cases of the platform, a more detailed explanation of how the system can be integrated with recording workflows would strengthen the manuscript.

      (6) The manuscript highlights the opportunity to relate task behavior to home-cage activity and to study individualized behavioral patterns. To better support these aspects, it would be helpful to include more subject-level analyses, rather than relying predominantly on population averages, or alternatively to discuss in more concrete terms which features of the dataset may be especially informative for studying individuality. More generally, the manuscript would benefit from clarifying whether different parameter settings within this group-housed framework may be better suited for maximizing training efficiency versus preserving more naturalistic or socially modulated behavior, and what the implications of these choices may be for interpretation.

      (7) In Table S1, 'Touch screen' is task-specific and is not necessarily a metric. 'Testing outside home cage' is also not necessarily an advantage (please clarify if it is). Many other systems implemented different levels of 'Alarm system', which is not reflected in the table.

      (8) Table S3 shows important data that help the reader to evaluate the paper's work, thus is deserved to move to the main text.

    3. Reviewer #2 (Public review):

      Summary:

      The Training Village (TV) is an innovative autonomous system for rodent training. By integrating an operant box with a group-housed home-cage environment, this platform enables animals to learn operant behaviors while preserving their social context and interactions, which is an aspect often overlooked in the field. The flexibility and modularity of the TV system allow training across multiple cognitive tasks in a continual learning framework. Furthermore, its remote accessibility and affordability make it a compelling tool for the broader neuroscience community.

      Comments:

      (1) Social Hierarchy and Access Competition

      Previous studies on rodent social hierarchy (e.g., PMID: 21960531) have demonstrated clear dominance structures within group-housed animals. Based on this, one might expect dominant animal(s) to occupy more sessions and trials than subordinate animals by preferentially accessing the operant box. Therefore, it is somewhat surprising to observe a relatively uniform distribution of operant box occupancy across animals (Figure 2a, 2i). As a control, it would strengthen the manuscript to include an independent assessment of social hierarchy (e.g., tube test, barber assay, or similar behavioral metrics) to quantitatively characterize dominance relationships within the cohort. Correlating these rankings with chamber occupancy and trial frequency would significantly strengthen the validation of the system's equity.

      (2) Behavioral Saving Effects in Continual Learning

      The authors demonstrate that the TV platform allows for the sequential learning of multiple cognitive tasks (Figure S3e). This provides an excellent opportunity to examine a continual learning paradigm. A key hallmark of successful continual learning is the "behavior savings effect", where re-learning a previously acquired task occurs faster than initial learning. For example, if animals are trained sequentially on task A (e.g., 2AFC), then task B (e.g., 2AB), and subsequently re-trained on task A, do they exhibit accelerated re-learning? Including such an analysis would significantly strengthen the claim regarding continual learning capabilities.

      (3) Robustness of Multi-Animal Attempt Detection

      In the TV platform, only one animal can access the operant box at a time under group-housed conditions. This setup inherently introduces the possibility of "multi-animal attempts", as shown in Figure 2j-k and Figure S2c. While the authors address this using pixel-based classification, additional quantitative validation would improve confidence in this approach. For instance, presenting the distribution of pixel counts for single-animal versus multi-animal events would be informative. Moreover, given variability in body size across animals, a fixed pixel threshold may not be sufficient. It would be helpful to include analyses of classification performance (e.g., Type I and Type II error rates) across different animal pairings within the same cohort.

      (4) Protocol Flexibility and Implementation

      It would be helpful to clarify how behavioral task protocols are switched within the TV system. Specifically, are task changes applied globally to all animals sharing the operant box, or can they be assigned individually? Additionally, are task sequences pre-programmed prior to the experiment, or can they be modified dynamically during ongoing experiments?

      (5) Presentation and Readability

      To improve readability, the Discussion section could be streamlined, as it is currently somewhat lengthy and descriptive.

    4. Reviewer #3 (Public review):

      Summary:

      The Training Village (TV) is an open-source automated platform for continuous training and testing of group-housed mice and rats in cognitive tasks. Animals live in enriched multi-compartment home cages and access a single operant box individually through a sorting corridor controlled by RFID identification and real-time video analysis. A Raspberry Pi 5 runs the entire system, manages an adaptive training algorithm, monitors animal welfare, and allows remote supervision via a graphical interface and Telegram alarm system. The system is validated across 12 groups totaling 121 animals, three cognitive paradigms of varying complexity, and experiments lasting up to 12 months.

      Strengths:

      (1) The open-source implementation is probably the paper's strongest point. The authors provide not just code but 3D-printable designs, a full bill of materials with costs (~5500€ total), assembly instructions, and a dedicated website. The estimated build time of 2-7 days is credible. In the current landscape of methods papers, this level of documentation is the minimum necessary to allow other laboratories to actually adopt and propagate the system - and the authors deliver it fully. The compatibility with two operant box designs, three cognitively distinct tasks, and two species - demonstrated empirically rather than merely claimed - makes the modularity argument credible and distinguishes the TV from systems designed around a single paradigm. Finally, the combination of automatic weighing at each exit, temperature and humidity tracking, and a granular Telegram alarm system (Table S2) represents a meaningful practical contribution. For a system operating 24/7 without daily human supervision, this level of welfare monitoring is a necessity, and it seems well implemented here.

      (2) With 121 animals across 12 groups, three distinct cognitive paradigms, two species, and longitudinal data spanning up to 12 months, the validation effort is substantial. The authors acknowledge the limitations of their comparisons - notably that the TV vs. manual training comparison is not a controlled experiment. The rat dataset is limited in scope, but the authors at least demonstrate that the system can be adapted to a second species, which is a useful proof of concept. The demonstration that task engagement increases progressively over 12 months (Fig. 3g) is a novel observation at this temporal scale, with practical implications for the design of long-term experiments.

      (3) The demonstration that operant box usage is distributed nearly uniformly across animals (Gini < 0.15 in all groups) is carefully demonstrated and addresses a question that any laboratory considering this type of system will legitimately ask, e.g., whether dominant individuals monopolize access at the expense of subordinates. This has been shown before in comparable systems, but remains a necessary validation for each new implementation. The control condition removing temporal constraints (Figure S4) adds useful mechanistic insight into the role of the refractory interval. However, the interpretation of this result deserves more nuance than the authors provide - see Weaknesses.

      Weaknesses:

      (1) The TV is more than an automation tool; its architecture makes the most sense if one intends to study how spontaneous home cage behavior relates to individual cognitive performance, and the introduction and discussion explicitly frame this as a key application. Yet the analysis delivers only group-level descriptive results, and the cognitive data are presented almost exclusively as group averages. The individual-level questions that the system is uniquely positioned to address (do stable home cage behavioral profiles emerge across animals, do animals learn at the same rate and using the same strategies, and do these dimensions correlate with each other ) are never asked. This is particularly relevant given that enriched social environments are precisely the conditions under which stable inter-individual differences tend to emerge spontaneously, even among genetically identical animals (Freund et al., 2013, Science), and that comparable systems have already linked such profiles to cognitive and neurochemical phenotypes (Torquet et al., 2018, Nature Communications). The TV clearly has the data to begin exploring this - doing so would substantially strengthen the paper's scientific contribution beyond its methodological value.

      (2) Sustained daytime operant box usage in nocturnal animals deserves more discussion: Box occupancy during the light phase remains around 75% - only modestly below the ~85% seen at night (Fig. S5a-b). The authors conclude this reflects "sustained engagement with the task throughout the circadian cycle," but other explanations are not considered: residual thirst driving animals to seek sucrose water during the day, and the refractory interval mechanically redistributing sessions into the light phase? A more explicit discussion of the consequences of 24/7 unsupervised testing for data quality (daytime sessions may yield noisier behavioral data?) would be useful.

      (3) The finding that all animals access the operant box in roughly equal proportions (Gini < 0.15) is practically important and carefully demonstrated. However, the authors' interpretation that animals self-organize in an egalitarian manner despite known social hierarchies deserves a note of caution. The system design itself constrains monopolization: the refractory interval imposes the same waiting time on all animals regardless of social rank, and session duration determines how often the box becomes available. The no-constraint control (Figure S4) partially addresses this but was run on already-trained animals, limiting its interpretive value. The key practical message, that all animals can access the task regularly under the proposed design, is well supported. Whether this reflects genuine social tolerance or is primarily a consequence of system constraints is a subtler question that the current data cannot fully resolve.

      (4) The rat cohort consists of a single group of 6 female Long-Evans rats, yet species comparisons are drawn across multiple dimensions (daily sessions, task engagement, performance...). Observed differences could reflect group size, sex, strain, reward calibration, or simple individual variability rather than species differences. These results should be presented for what they are: a useful proof of concept showing the system works with a second species, not a basis for comparative conclusions.

    1. eLife Assessment

      This study provides a valuable contribution to our understanding of the neural basis of perceptual decision-making by jointly modeling behavioral outcomes and EEG signals in a contrast comparison task. The methods and analyses are solid, systematically comparing standard models assuming continuous evidence accumulation with models that track evidence without temporal integration (extrema detection). The authors show that behavior and neural signals are equally consistent with both alternatives, highlighting limitations in current modeling approaches and questioning the generality of evidence accumulation mechanisms.

    2. Reviewer #1 (Public review):

      Summary:

      This paper examines whether humans use protracted temporal integration in a noise-free, deferred-response contrast discrimination task, using a covert evidence-duration manipulation combined with EEG (SSVEP, CPP, Mu/Beta). The key finding is that evidence for protracted sampling is behaviorally and neurally supported, but even joint CPP + behaviour fitting cannot fully discriminate a standard integration (DDM) model from a novel "extremum-flagging" non-integration model. The paper is transparent about this outcome.

      Strengths:

      This is a well-conducted and well-written study that makes a genuine contribution to the perceptual decision-making literature by introducing a clean experimental design for probing temporal integration without participants adapting their strategy and demonstrating for the first time that a non-integration model (extremum-flagging) can replicate CPP waveform dynamics that have long been considered hallmarks of evidence accumulation. The transparent treatment of equivocal modelling outcomes is commendable.

      Weaknesses:

      My main concerns relate to statistical power, the under-specification of the and the extremum-flagging mechanism. Addressing these would greatly strengthen the paper.

      (1) The sample of 16 participants (15, after the exclusion of one participant) is described as "close to similar EEG studies" with no formal power analysis. Given that the paper's core claim rests on subtle quantitative differences between two model classes - differences that are, by the authors' own admission, not sufficient to declare a winner - even a modest increase in sample size might yield a more decisive outcome. At a minimum, the authors should report a sensitivity analysis or post-hoc power calculation to indicate what effect sizes the current N could reliably detect, particularly for the rmANOVA comparisons and the neural constraint fitting.

      (2) The Extremum-flagging model is the paper's most novel contribution, yet its physiological basis is underspecified. The model posits that each decision-terminating bound-crossing triggers a stereotyped, half-sine-shaped centroparietal signal, but no neural circuit or computational mechanism is proposed for how the brain could detect the first bound-crossing event in a non-accumulating evidence stream or generate a temporally precise, fixed-amplitude signal in response. Possible connections to P3b theories of context updating and response facilitation are acknowledged, but these are vague functional descriptions rather than mechanistic accounts. I think the discussion should engage more directly with potential neural substrates that could generate this flagging signal, and whether these are consistent with the known generators of the CPP/P3b. Without this, the extremum-flagging model risks being viewed as a mathematical convenience rather than a biologically plausible alternative.

      (3) The Integration model at the preferred neural weighting estimates a high-to-low contrast drift rate ratio of 8.7, whereas the empirical Mu/Beta lateralization slopes suggest a ratio of approximately 3.5. The authors attribute this discrepancy to the nonlinear contrast response function of early visual cortex and the salience of the high-contrast evidence onset, but these explanations are speculative. These outcomes are arguably the most quantitatively damaging result for the integration model, so they deserve more than a brief discussion. I would recommend that the authors (a) estimate what range of contrast response nonlinearities would be required to close this gap, (b) test whether an alternative drift rate parameterization (e.g., scaling drift rates directly by SSVEP amplitude rather than contrast) reduces the discrepancy, or (c) be more explicit about treating this as a point against the Integration account.

      (4) The sensitivity analysis over neural constraint weightings (w = 0.1 to 1000) is thoughtful, but the paper ultimately acknowledges that the preferred weighting is w=10, chosen because it achieves "a good fit to CPP dynamics without substantively sacrificing behavioral fit" - a qualitative criterion. No principled statistical framework is used to select the optimal weighting or to compare models at a given weighting. A Bayesian model comparison could provide a more formal framework for combining behavioral and neural fit components, and would allow a clearer statement about the relative posterior probability of each model.

    3. Reviewer #2 (Public review):

      Summary:

      The manuscript by Hajimohammadi, Mohr, O'Connell and Kelly is intended to demonstrate that participants integrate evidence over time to make a decision, even in a noise-free, static decision context. This is validated by the observation that (1) participant accuracy improves with increased exposure to the stimulus; and (2) there is a correlation between participant accuracy and a neural index of evidence accumulation, as measured by centro-parietal positivity (CPP).

      Strengths:

      (1) Joint modelling of accuracy and CPP dynamics is a significant achievement, as behaviour alone often cannot distinguish between competing theories of decision-making. In the case of protracted sampling in particular, the absence of reaction times (RT) due to the delayed nature of the response makes this method highly appealing.

      (2) The experimental manipulations and the method used to extract the different neural indices are well chosen, enabling the mapping of putative cognitive processes such as evidence accumulation and motor preparation onto the recorded EEG with clarity.

      (3) The in-depth discussion of the results clearly articulates those reported by the authors and in previous works.

      Weaknesses:

      (1) One main issue to support the interpretation of the authors toward the need for protracted sampling is the timing of the evidence. By design, participants believe that the signal is present for 1.6 seconds (reinforced by the fact that easy trials were displayed for 1.6 seconds). However, the difference in stimuli is turned off either 1.4, 1.2, 0.8 or 0 seconds before the cue to respond. While this makes sense in the context of the authors' question, it also raises the possibility that participants will focus on the last samples before answering. Even if participants apply equal weighting, this still favours them delaying evidence accumulation until they are sufficiently certain that the evidence should be present (e.g. participants might start accumulating after the stimulus has disappeared in the 0.2 condition). I do not see an easy way to test these alternative explanations outside of running a study in which the evidence is always offset before the go cue.

      (2) Regarding the behavioural models, are these identifiable based on accuracy data alone? This should be addressed using a parameter recovery study, in which a set of parameters is used to generate data, and the same fitting routine used for the real data is used to estimate the parameters. This would enable us to determine what can be inferred from the model comparison presented. This is not a serious problem for the manuscript, as it specifically aims to go beyond behaviour. It is, however, worth noting that such a parameter recovery addition could be used to demonstrate the need for a joint modelling framework to answer the question of protracted sampling on delayed response times (RT).

      Minor comments:

      (1) I would advise authors to fix the D1 parameter and use it as a scaling parameter across all models. Currently, as I understand it, the models are scale-free, meaning the same fit is achieved by multiplying all parameters by two, for example. This makes the fit more complex (bounds on parameter values are required) and means that the models are less comparable in terms of their estimates. Perhaps I'm missing something, but I would have thought that fixing D1 (the common parameter across all models) would solve these issues.

      (2) Why is the snapshot model so bad despite being a good model in Stine et al 2020? Can the authors speculate in the discussion?

      (3) The meaning of the flag width is unclear. Figure 4 provides the reader with an intuitive understanding of the model that the authors have in mind. However, the tables in the appendices report values between 0.2 and 0.9. I understand that these values represent the width of the half-sine in seconds. This suggests that the actual estimated values for these flag events are much broader than those displayed in Figure 4. While this is probably fine for most models, it can be problematic for the extremum-flagging model, as it means that the rise to the peak takes between 0.1 and 0.45 seconds. While strictly speaking, this is still a 'flag' model, such a slow rise to the peak, given the usual expectation of evidence accumulation, would place this model closer to a smooth integration model than to a boundary-crossing flagging mechanism.

      (4) In the modelling section, it is not clear overall (i.e. for G² and R²) how the participant dimension is taken into account. Are these individually fitted models, and if so, how are the secondary statistics generated from the individual estimates? Or were these fitted over all participants?

      (5) On page 7, in the last sentence of the first paragraph of the section titled 'Decision-Related Neural Signals', the authors state that 'this stable contrast-difference encoding suggests that a constant (i.e. non-adapting) drift rate is a reasonable simplifying model assumption'. However, I am not sure how this is true given that SSVEP quantifies encoding, yet the drift rate can vary due to non-sensory aspects (e.g. attention).

      (6) The mu/beta lateralisation does indeed favor the integration model more, but in terms of boundary estimation and starting-point analyses, both models are pretty far apart. Providing an interpretation of this observation, e.g. regarding alternative linking functions for mu/beta, would add to the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aim to compare proposal models of perceptual decision making using a joint modeling approach, where they fit models to both behavioral outcomes as well as CPP. Most notably, they compare a standard evidence accumulation model with models that track the evidence without integrating it over time (extrema detection). The authors report that the joint CPP-behavioral data do not discriminate between two of their proposals.

      Strengths:

      This is an interesting finding that reinforces the idea that what we believe to see based on aggregation over trials may not be what happens on every single trial. The models are creative, and the simulations are convincing, relating the models to multiple neural markers of decision formation. These include the CPP but also mu/beta power spectra.

      Weaknesses:

      The paper makes some strong points, and the work seems generally well-executed. The weaknesses that I identified are twofold:

      (1) Embedding in the literature/exposition of the main argument.

      The focus in the introduction is on the noise-free nature of the stimulus and the prolonged presentation time. However, after reading the paper, I felt these were mostly experimental design choices that enable comparison of the different models using the CPP. Perhaps my misreading of the goals of the paper stems from two other observations:

      a) The fact that the stimulus is noise-free does not entail that perception is noise-free. Thus, the argument that using a noise-free stimulus precludes the necessity of temporal integration seems not completely valid. Of course, one could argue that noise is limited in this case, but that makes a noise-free stimulus more of a design choice.

      b) The focus on prolonged stimulus presentation, but at the same time the contrast with expanded judgement, did not make sense to me. Perhaps, as a non-native speaker, I am misreading the subtle difference between "protracted sampling" and "longer sampling", but again, the longer duration seems mostly a design choice.

      More could be said about the optimality of the extrema detection methods. In particular, decades of work (centuries?) have shown that evidence integration is an optimal decision-making procedure: For example, the Sequential Probability Ratio Test is Bayes-optimal wrt mean RT (Wald, 1946); evidence accumulation together with collapsing threshold serves to maximize rewards in repeated choices (e.g., Bogacz et al., PsychRev, 2006; Boehm et al. APP, 2020). Given all this work, why would the brain have evolved to adopt a different mechanism? I realize that the paper is not about optimal decision making, but some discussion of this point seems warranted.

      (2) Modeling choices.

      The authors introduce a parameter, sampT, that represents uncertainty in the sampling onset time. It was not clear to me whether this parameter represented an offset of all trials, or a distribution (probably the latter). I wonder how exactly this parameter was integrated into the models, and in particular, if and how it interacts with the starting-point parameters. My intuition is that on a single-trial, IF early sampling occurs, you can model that with either a negative sampT and z at 0, or with sampT at 0 but a shift in z. This would suggest trade-offs between these parameters, making them hard to estimate independently. Since the paper does not depend on the identification of parameter estimates, this may not be a huge problem, but nevertheless it is good to explore the consequences.

      The way the Bounded Integration model (BIntg) is formulated seems very close to the EZ-diffusion model (Wagenmakers et al., PBR, 2007). This model states that the proportion of correct responses Pc = 1/(1+exp(-B*D/s^2), with B and D the bound and drift rate parameters, respectively. However, filling in the numbers for the high contrast condition from Table 2, and assuming that s=2 (because the model description states that dt=2, with s undefined), I get a Pc of 80% for the 1.6H condition. This seems substantially less than what Figure 2 suggests.

      On some occasions, it is unclear to me what modeling choices are being made:

      a) It seems as if the models are fit on accuracy data alone (before introducing the neural data). This seems suboptimal given that the authors do report differences in RT.

      b) Are the models fit on all data combined, or on the data of individual participants? Fitting individual participant data is preferred, as combined or aggregated data may be distorted by individual differences.

      c) The authors seem to suggest that the diffusion coefficient s is estimated (in the section "Integration models"). Most likely, however, this is set to a fixed value. Obviously, it matters for the model comparison using AIC whether this parameter was freely estimated or not.

      Not really a weakness, but I wondered about the effect of stimulus duration on RT. In particular, what hypothesis (or post hoc explanation) do the authors have for these RT effects? I could think of at least three hypotheses that are consistent with the behavioral data:

      a) H1: The shorter the evidence duration, the more likely participants are to require a double-check before response execution, reflecting their uncertainty about their decision.<br /> b) H2: There is a collapsing threshold that initiates at stimulus offset, leading to quicker responses on trials where there is more evidence.<br /> c) H3: motor preparation is correlated with the evidence signal, which leads to faster responses on trials with more evidence.

    1. eLife Assessment

      This fundamental work significantly advances our understanding of the circuit-level implementation of predictive processing by elucidating the functional influence between putative prediction error neurons in layer 2/3 and putative internal representation neurons in layer 5. The evidence demonstrating that neither the hierarchical nor the non-hierarchical variant of predictive processing fully accounts for the presented data is convincing. Moving forward, this line of work would benefit from explicitly comparing different theories, thereby clearly articulating the points raised in this paper.

    2. Reviewer #1 (Public review):

      Vasilevskaya and Keller test different models of cortical function through the lens of predictive processing, a powerful framework for the brain to learn and predict the statistics of the world via generative internal models. The authors use a clever combination of behavioral perturbations in closed-loop and open-loop visuomotor virtual reality assays, a paradigm the Keller lab pioneered and used effectively in the past decade, in conjunction with two-photon imaging of neuronal calcium responses and targeted optogenetic perturbations of activity. They specifically put to test proposed hierarchical vs. non-hierarchical circuit implementations of predictive processing by analyzing the logic of inter-lamina interactions (superficial vs. deep; L2/3 vs. L5/6).

      The authors conclude that both versions of predictive processing architectures they analyze are likely invalid, and instead formulate an alternative novel model of cortical function based on a recently developed machine learning algorithm for self-supervised learning (joint embeddings of predictive architectures, JEPA) and its further refinements. JEPA borrows elements from predictive processing, engaging two encoder networks and training the output of one network to predict the output of the other. In their new model of cortical computations, prediction error neurons in L2/3 compare the deep layers (L5/6) activity, which is taken as a teaching signal, to a local, L2/3 prediction of this latent representation.

      Specifically, the authors build on their previous work and reports from other groups that different sets of L2/3 neurons compute positive prediction errors (fire when sensory stimuli appear unexpectedly with respect to the movements of the animal; e.g., grating onsets in the absence of locomotion) and respectively negative prediction errors (fire when sensory stimuli are absent, while the brain expected them to be present; e.g. mice locomote but visual flow is suddenly halted - visuomotor mismatches). These L2/3 positive and negative prediction error neurons exchange messages with neurons in the deeper cortical layers that, the authors propose, build an internal representation (R) of the sensory stimuli given the animals' movements.

      In the hierarchical model, internal representation neurons (R) are supposed to act as a teaching signal for both types of prediction error neurons; the output of the positive prediction error neurons is assumed to suppress activity of R such that the error between the teaching signal and the prediction is minimized; similarly, in the non-hierarchical version, R serves as a prediction for the prediction error neurons, and in turn it receives excitatory drive from the positive prediction error neurons and negative input from the negative prediction error neurons.

      The authors find that the functional impact of L5 neurons on L2/3 neurons is not compatible with the non-hierarchical architecture they and other groups proposed, but rather in accordance with the hierarchical model. At the same time, the functional impact of L2/3 neurons (positive vs. negative prediction error neurons) on L5 neurons (internal representation) appears not compatible with the hierarchical model, but rather in accordance with the non-hierarchical implementation.

      They further hypothesize that L2/3 prediction error neurons don't use sensory input, but rather the L5 activity as a teaching signal, and test it using perturbations (halts) of optogenetic stimulation of L5 neurons coupled with locomotion (Figure 7).

      All in all, the question is topical, and the new model addresses a decades-long quest to develop a unifying model of cortical function. The findings reported here transform our understanding of cortical computations, opening new, exciting avenues for future investigation. The experimental design and execution are rigorous; the arguments are clearly laid out (in spite of ample potential for confusion given the numerous loops and sign flips). These include a discussion of why the non-hierarchical model proposed by the same group does not hold, as well as potential caveats in interpreting the results and novel testable proposed experiments emerging from the JEPA-like model.

      I have several questions about the interpretations of some of the claims and suggestions for potential additional experiments and analyses.

      (1) Some of the pieces of the puzzle remain to be identified and demonstrated: the existence of internal representation neurons in L2/3 and ascertaining that the L5/6 neurons analyzed function indeed as internal representation neurons. The authors find that stimulation of L2/3 positive prediction error neurons enhances activity of L5 neurons...If L5 neurons hold a latent representation that serves as a teaching signal for L2/3 neurons (as the authors posit), wouldn't one expect that the input they receive from the positive prediction neurons be suppressive, such that the error is further minimized?

      (2) Do the authors envision any specific differences between the representations of the two encoder networks posited to exist in L2/3 and L5 in the JEPA-like implementation? Are they synchronous/offset in their temporal representations, or any other features?

      (3) Where is the prediction coming from onto L2/3 neurons? Is it emerging locally in L2/3 from the putative internal representation neurons, or is it long-range - as work from the authors previously proposed? Or a mix of both?

      (4) What is the role of the indiscriminate L4 input that appears to enhance activity of both positive and negative prediction error neurons in L2/3?

      (5) Does Figure 7D change in a meaningful manner if the authors plot the correlation between optomotor mismatch response and visuomotor mismatch response specifically for the negative prediction error neurons in L2/3 (Adamts-2) rather than for all L2/3 cells sampled?

      (6) Do the optomotor mismatch responses in L2/3 neurons depend on how long the closed-loop coupling of optogenetic stimulation of Tlx3 L5 neurons and locomotion speed has been in place for?

    3. Reviewer #2 (Public review):

      This manuscript reveals the functional connectivity of two different classes of cortical neurons that respond in opposite ways to mismatches between sensory and top-down inputs. These data are very valuable because different theories of information processing in the cortex make different predictions on the patterns of connectivity of these neurons. Therefore, these data strongly constrain possible theories of cortical processing.

      General comments:

      (1) The methods of statistical testing are insufficiently described. I did not understand the description in lines 1105-1119. The authors should provide sufficient details so the reader can reproduce their analyses. For example, it may be helpful to provide specific details of the testing procedure for one of the comparisons (e.g. the first comparison in Table S1).

      (2) The authors should clarify how the problem of multiple comparisons was addressed for comparisons performed in multiple moments of time, where significance is indicated by a black bar (e.g. in Figure 2F).

      (3) It would be helpful to add a figure in the Discussion summarising the functional connectivity suggested by all experiments.

      (4) Throughout the manuscript, the authors use the term "teaching signals", but I am unclear what they mean by it: after reading the definition in lines 45-46, I thought that they corresponded to values (as they are compared to sensory signals). Later (428-430), the text suggests that they correspond to error neurons. But then lines 605-607 say it is not an error signal. The authors should define teaching signals very precisely or remove this term.

    4. Reviewer #3 (Public review):

      Vasilevskaya and Keller set out to experimentally distinguish between two variants of predictive processing: a hierarchical and a non-hierarchical variant. The hierarchical variant assumes a hierarchical organization in which internal representation neurons (believed to be a subset of layer 5 excitatory neurons) serve as a source of a teaching signal for local prediction error neurons as well as for the next higher level of the hierarchy, while simultaneously providing prediction signals to the preceding lower level. In contrast, the non-hierarchical variant posits that these layer 5 internal representation neurons provide local predictions to layer 2/3 prediction error neurons.

      The interaction between internal representation neurons and prediction error neurons differs fundamentally between the two variants. In the hierarchical variant, internal representation neurons excite positive prediction error neurons and inhibit negative prediction error neurons, while at the same time being inhibited by positive prediction error neurons and excited by negative prediction error neurons. In the non-hierarchical variant, this pattern of connectivity is reversed.

      This work is very exciting, timely, and carefully executed. The authors functionally, and later molecularly, identify layer 2/3 prediction error neurons in V1 and probe their interactions with genetically defined neuron types in cortical layers 5 and 6 using optogenetics. They demonstrate that the functional influence of putative prediction error neurons in layer 2/3 onto layer 5 is incompatible with the hierarchical variant, whereas the influence of layer 5 onto putative prediction error neurons in layer 2/3 is incompatible with the non-hierarchical variant. They then test an alternative hypothesis, in which layer 2/3 responses resemble prediction errors with respect to perturbations of artificial layer 5 activity patterns. To investigate this, they designed an experiment in which optogenetic activation of L5 IT neurons was closed-loop coupled to the mouse's locomotion speed in the absence of visual feedback, allowing them to probe the causal influence of L5 activity on layer 2/3 responses.

      Finally, the authors hypothesize that their data are more consistent with a joint embedding predictive architecture (JEPA) and outline experimentally testable predictions arising from this framework.

      While the work is overall convincing and significantly advances our understanding of the circuit-level implementation of predictive processing, there are a few weaknesses that should be addressed or discussed:

      (1) The authors define putative positive prediction error neurons as the 15% of neurons most responsive to grating onset and putative negative prediction error neurons as the 15% most responsive to visuomotor mismatch. While this selection would be expected to overlap with negative and positive prediction error neurons, the criterion is not sufficiently stringent (independent of the exact percentage chosen). In particular, classification of a neuron as a prediction error neuron should ideally be accompanied by evidence that it does not exhibit a significant increase in activity when the prediction matches the sensory input or teaching signal.

      (2) The authors "speculate that the prediction error responses in layer 2/3 may not be computed with respect to sensory input, but with respect to layer 5 activity as a teaching signal." However, it is unclear how this perspective differs from earlier statements in the manuscript. In the Introduction, the authors note that "these signals, typically referred to as sensory signals, we will refer to as teaching signals," and later describe the hierarchical variant as one "in which internal representation neurons act as a source of the teaching signal." Given this framing, it is difficult to identify what is conceptually novel in the updated view. Is the key distinction that layer 2/3 neurons are now proposed to generate predictions in an internal representation space rather than in sensory input space, as briefly suggested in the Discussion? Or are the authors introducing a distinction between an external (sensory) and an internal (cortical) teaching signal? If so, this distinction should be made explicit. Clarifying this point would considerably strengthen the manuscript.

      (3) The authors propose that "L2/3 neurons predict L5 activity, hence making predictions in the internal representation space rather than the input space," and further suggest that, since both deep and superficial cortical layers receive thalamic input, the cortex may function like a JEPA. This idea appears closely related to the model introduced by Nejad et al. (2025), which effectively implements a JEPA-like architecture: L5 activity serves as a target against which L2/3 predictions are compared in a self-supervised manner, with both L5 and L2/3 (via L4) receiving thalamic input. It would be helpful for the authors to clarify how their framework differs from that model, and to specify the key conceptual or mechanistic distinctions between the present proposal and the approach described by Nejad et al..

    1. eLife Assessment

      This study presents a valuable finding on the mutational landscape and expression profile of ZNF molecules in 23 Kenyan women with breast cancer. The evidence supporting the claims of the authors is solid, although inclusion of a larger number of patient samples, more statistical details and sufficient comparison with existing large-scale datasets would have strengthened the study. The work will be of interest to medical biologists working in the field of breast cancer.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates mutations and expression patterns of zinc finger proteins in Kenyan breast cancer patients. Whole-exome sequencing and RNA-seq were performed on 23 breast cancer samples alongside matched normal tissues.

      Strengths:

      Whole-exome sequencing and RNA-seq were performed on 23 breast cancer samples alongside matched normal tissues in Kenyan breast cancer patients. The authors identified mutations in ZNF217, ZNF703, and ZNF750.

      Weaknesses:

      (1) Research scope:

      The results primarily focus on mutations in ZNF217, ZNF703, and ZNF750, with limited correlation analyses between mutations and gene expression. The rationale for focusing only on these genes is unclear. Given the availability of large breast cancer cohorts such as TCGA and METABRIC, the authors should compare their mutation profiles with these datasets. Beyond European and U.S. cohorts, sequencing data from multiple countries, including a recent Nigerian breast cancer study (doi: 10.1038/s41467-021-27079-w), should also be considered. Since whole-exome sequencing was performed, it is unclear why only four genes were highlighted, and why comparisons to previous literature were not included.

      (2) Language and Style Issues

      There are many typos and clear errors in the main text (e.g. (ref)).

      Additionally, several statements read unnaturally. For example:

      "Investigators uncovered 170 mutations ..." should instead be phrased as "We identified 170 mutations ...."

      "The research team ..." should be rephrased as "Our team ...."

      (3) Methods and Data Analysis Details

      The methods section is vague, with general descriptions rather than specific details of data processing and analysis. The authors should provide:

      (a) Parameters used for trimming, mapping, and variant calling (rather than referencing another paper such as Tang et al. 2023).

      (b) Statistical methods for somatic mutation/SNP detection.

      (c) Details of RNA purification and RNA-seq library preparation.

      Without these details, the reproducibility of the study is limited.

      (4) Data Reporting

      This study has the potential to provide a valuable resource for the field. However, data-sharing plans are unclear. The authors should:

      a) Deposit sequencing data in a public repository.

      b) Provide supplementary tables listing all detected mutations and all differentially expressed genes (DEGs).

      c) Clarify whether raw or adjusted p-values were used for DEG analysis.

      d) Perform DEG analyses stratified by breast cancer subtypes, since differential expression was observed by HER2 status, and some zinc finger proteins are known to be enriched in luminal subtypes.

      (5) Mutation Analysis

      Visualizations of mutation distribution across protein domains would greatly strengthen interpretation. Comparing mutation distribution and frequency with published datasets would also contextualize the findings.

      Comments on revisions:

      The revised manuscript hasn't addressed any of these concerns. Careful proofreading is recommended, even if the authors do not intend to make further modifications to the manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      This work integrated the mutational landscape and expression profile of ZNF molecules in 23 Kenyan women with breast cancer.

      Strengths:

      The mutation landscape of ZNF217, ZNF703, and ZNF750 were comprehensively studied and correlate with tumor stage and HER2 status to highlight the clinical significance.

      Weaknesses:

      The current cohort size is relatively small to reach significant findings, and targeted exploration on ZNF family without emphasizing the reason or clinical significance hinders the overall significance of the entire work.

    4. Reviewer #3 (Public review):

      Summary:

      This revised study analyzes the somatic mutational profiles and transcriptomic expression of three zinc-finger genes (ZNF217, ZNF703, ZNF750) in 23 Kenyan women with breast cancer, using whole-exome sequencing and RNA-sequencing of paired tumor-normal tissues. A total of 358 somatic mutations were detected, and all three genes were significantly upregulated in tumors compared to normal tissues (ZNF217 showing the most prominent difference). Higher expression was observed in HER2-positive tumors, though mutation burden for each gene did not correlate significantly with HER2 status or cancer stage. The findings provide preliminary evidence for the idenfication of diagnostic/prognostic biomarkers or therapeutic targets in sub-Saharan African populations.

      Strengths:

      The study's key strengths lie in its focus on an underrepresented Kenyan cohort, addressing a critical gap in sub-Saharan African breast cancer genomic research. It integrates DNA-level mutation analysis with RNA-level expression data, leveraging standardized bioinformatics pipelines (e.g., Mutect2 for variant calling, DESeq2 for differential expression) and rigorous quality control to deliver detailed insights into mutation types, functional impacts, and amino acid changes. Additionally, it explores gene expression patterns across different cancer stages and HER2 status subgroups, generating targeted hypotheses for future validation and enhancing the reliability of its findings.

      Weaknesses:

      The author has enhanced the descriptive depth of the study by adding details on mutations, expression subgroup analyses, and functional annotations but has not addressed the core weaknesses of small cohort size and lack of functional validation. While the revised version is more comprehensive in cataloging molecular alterations, it remains confined to descriptive analysis, with no substantial improvement in the reliability or generalizability of its conclusions.

    1. eLife Assessment

      This valuable study characterises the activity of motor units from two of the three anatomical subdivisions ("heads") of the triceps muscle while mice walked on a treadmill at various speeds. Altogether, this is the most thorough characterisation of motor unit activity in walking mice to date, providing convincing evidence for probabilistic recruitment of motor units that differed between the two heads.

    2. Reviewer #1 (Public review):

      [Editors' note: this version has been assessed by the Reviewing Editor without further input from the original reviewers. The authors have addressed the comments raised in the previous round of review.]

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observe differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools, and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

    3. Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to characterise the firing activity of individual motor units in mice during locomotion. To achieve this, the team implanted small arrays of eight electrodes into two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Concurrently, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice across five speeds, ranging from 10 to 27.5 cm-1.

      From these data, the authors demonstrate that:

      - Their recording method and adapted spike-sorting algorithm enable robust decoding of motor unit activity during rapid movements.

      - Identified motor units tend to be recruited during a subset of strides, with recruitment probability increasing with speed.

      - Motor units within individual heads of the triceps likely receive common synaptic inputs that correlate their activity, whereas motor units from different heads exhibit distinct behaviour.

      The authors conclude that these differences arise from the distinct functional roles of the muscles and the task constraints (i.e., speed).

      Strengths:

      - The novel combination of electrode arrays for recording intramuscular electromyographic signals from a larger muscle volume, paired with an advanced spike-sorting pipeline capable of identifying motor unit populations.

      - The robustness of motor unit decoding during fast movements.

      Weaknesses:

      - The data do not clearly indicate which motor units were sampled from each pool, leaving uncertainty as to whether the sample is biased towards high-threshold motor units or representative of the entire pool.

      - The results largely confirm the classic physiological framework of motor unit recruitment and rate coding, offering limited new insights into motor unit physiology.

      Comments on previous version:

      I would like to thank the authors for their thorough and insightful revisions. I am particularly pleased with the inclusion of the new analyses demonstrating the robustness of motor unit decoding, as well as the improved transparency regarding spike-sorting yield for each muscle and animal. Additionally, the new analyses illustrating that recruitment within muscle heads is consistent with the presence of common synaptic inputs and orderly recruitment significantly strengthen the manuscript.

    4. Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that 1) motor units are recruited differently in the two types of muscles and 2) individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle. Third, the recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique dataset, and the data analysis is convincing and well-executed.

      Weaknesses:

      After the revision, I no longer see any apparent weaknesses in the study.

    5. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Here, the authors have addressed the recruitment and firing patterns of motor units (MUs) from the long and lateral heads of the triceps in the mouse. They used their newly developed Myomatrix arrays to record from these muscles during treadmill locomotion at different speeds, and they used template-based spike sorting (Kilosort) to extract units. Between MUs from the two heads, the authors observed differences in their firing rates, recruitment probability, phase of activation within the locomotor cycle, and interspike interval patterning. Examining different walking speeds, the authors find increases in both recruitment probability and firing rates as speed increases. The authors also observed differences in the relation between recruitment and the angle of elbow extension between motor units from each head. These differences indicate meaningful variation between motor units within and across motor pools and may reflect the somewhat distinct joint actions of the two heads of triceps.

      Strengths:

      The extraction of MU spike timing for many individual units is an exciting new method that has great promise for exposing the fine detail in muscle activation and its control by the motor system. In particular, the methods developed by the authors for this purpose seem to be the only way to reliably resolve single MUs in the mouse, as the methods used previously in humans and in monkeys (e.g. Marshall et al. Nature Neuroscience, 2022) do not seem readily adaptable for use in rodents.

      The paper provides a number of interesting observations. There are signs of interesting differences in MU activation profiles for individual muscles here, consistent with those shown by Marshall et al. It is also nice to see fine-scale differences in the activation of different muscle heads, which could relate to their partially distinct functions. The mouse offers greater opportunities for understanding the control of these distinct functions, compared to the other organisms in which functional differences between heads have previously been described.

      The Discussion is very thorough, providing a very nice recounting of a great deal of relevant previous results.

      We thank the Reviewer for these comments.

      Weaknesses:

      The findings are limited to one pair of muscle heads. While an important initial finding, the lack of confirmation from analysis of other muscles acting at other joints leaves the general relevance of these findings unclear.

      The Reviewer raises a fair point. While outside the scope of this paper, future studies should certainly address a wider range of muscles to better characterize motor unit firing patterns across different sets of effectors with varying anatomical locations. Still, the importance of results from the triceps long and lateral heads should not be understated as this paper, to our knowledge, is the first to capture the difference in firing patterns of motor units across any set of muscles in the locomoting mouse.

      While differences between muscle heads with somewhat distinct functions are interesting and relevant to joint control, differences between MUs for individual muscles, like those in Marshall et al., are more striking because they cannot be attributed potentially to differences in each head's function. The present manuscript does show some signs of differences for MUs within individual heads: in Figure 2C, we see what looks like two clusters of motor units within the long head in terms of their recruitment probability. However, a statistical basis for the existence of two distinct subpopulations is not provided, and no subsequent analysis is done to explore the potential for differences among MUs for individual heads.

      We agree with the Reviewer and have revised the manuscript to better examine potential subpopulations of units within each muscle as presented in Figure 2C. We performed Hartigan’s dip test on motor units within each muscle to test for multimodal distributions. For both muscles, p > 0.05, so we can not reject the null hypothesis that the units in each muscle come from a multimodal distribution. However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.

      Still, the limited sample size warrants further data collection and analysis since the varying properties across motor units may lead to different activation patterns. Given these results, we have edited the text as follows:

      “A subset of units, primarily in the long head, were recruited in under 50% of the total strides and with lower spike counts (Figure 2C). This distribution of recruitment probabilities might reflect a functionally different subpopulation of units. However, the distribution of recruitment probabilities were not found to be significantly multimodal (p>0.05 in both cases, Hartigan’s dip test; Hartigan, 1985). However, Hartigan’s test and similar statistical methods have poor statistical power for the small sample sizes (n=17 and 16 for long and lateral heads, respectively) considered here, so the failure to achieve statistical significance might reflect either the absence of a true difference or a lack of statistical resolution.”

      The statistical foundation for some claims is lacking. In addition, the description of key statistical analysis in the Methods is too brief and very hard to understand. This leaves several claims hard to validate.

      We thank the Reviewer for these comments and have clarified the text related to key statistical analyses throughout the manuscript, as described in our other responses below.

      Reviewer #2 (Public review):

      The present study, led by Thomas and collaborators, aims to describe the firing activity of individual motor units in mice during locomotion. To achieve this, they implanted small arrays of eight electrodes in two heads of the triceps and performed spike sorting using a custom implementation of Kilosort. Simultaneously, they tracked the positions of the shoulder, elbow, and wrist using a single camera and a markerless motion capture algorithm (DeepLabCut). Repeated one-minute recordings were conducted in six mice at five different speeds, ranging from 10 to 27.5 cm·s⁻¹.

      From these data, the authors reported that:

      (1) a significant portion of the identified motor units was not consistently recruited across strides,

      (2) motor units identified from the lateral head of the triceps tended to be recruited later than those from the long head,

      (3) the number of spikes per stride and peak firing rates were correlated in both muscles, and

      (4) the probability of motor unit recruitment and firing rates increased with walking speed.

      The authors conclude that these differences can be attributed to the distinct functions of the muscles and the constraints of the task (i.e., speed).

      Strengths:

      The combination of novel electrode arrays to record intramuscular electromyographic signals from a larger muscle volume with an advanced spike sorting pipeline capable of identifying populations of motor units.

      We thank the Reviewer for this comment.

      Weaknesses:

      (1) There is a lack of information on the number of identified motor units per muscle and per animal.

      The Reviewer is correct that this information was not explicitly provided in the prior submission. We have therefore added Table 1 that quantifies the number of motor units per muscle and per animal.

      (2) All identified motor units are pooled in the analyses, whereas per-animal analyses would have been valuable, as motor units within an individual likely receive common synaptic inputs. Such analyses would fully leverage the potential of identifying populations of motor units.

      Please see our answer to the following point, where we address questions (2) and (3) together.

      (3) The current data do not allow for determining which motor units were sampled from each pool. It remains unclear whether the sample is biased toward high-threshold motor units or representative of the full pool.

      We thank the Reviewer for these comments. To clarify how motor unit responses were distributed across animals and muscle targets, we updated or added the following figures:  

      Figure 2C

      Figure 4–figure supplement 1

      Figure 5–figure supplement 2

      Figure 6–figure supplement 2

      These provide a more complete look at the range of activity within each motor pool, suggesting that we do measure from units with different activation thresholds within the same motor pool, rather than this variation being due to cross-animal differences. For example, Figure 2C illustrates that motor units from the same muscle and animal show a wide variety of recruitment probabilities. However, the limited number of motor units recorded from each individual animal does not allow a statistically rigorous test for examining cross-animal differences.

      (4) The behavioural analysis of the animals relies solely on kinematics (2D estimates of elbow angle and stride timing). Without ground reaction forces or shoulder angle data, drawing functional conclusions from the results is challenging.

      The Reviewer is correct that we did not measure muscular force generation or ground reaction forces in the present study. Although outside the scope of this study, future work might employ buckle force transducers as used in larger animals (Biewener et al., 1988; Karabulut et al., 2020) to examine the complex interplay between neural commands, passive biomechanics, and the complex force-generating properties of muscle tissue.

      Major comments:

      (1) Spike sorting

      The conclusions of the study rely on the accuracy and robustness of the spike sorting algorithm during a highly dynamic task. Although the pipeline was presented in a previous publication (Chung et al., 2023, eLife), a proper validation of the algorithm for identifying motor unit spikes is still lacking. This is particularly important in the present study, as the experimental conditions involve significant dynamic changes. Under such conditions, muscle geometry is altered due to variations in both fibre pennation angles and lengths.

      This issue differs from electrode drift, and it is unclear whether the original implementation of Kilosort includes functions to address it. Could the authors provide more details on the various steps of their pipeline, the strategies they employed to ensure consistent tracking of motor unit action potentials despite potential changes in action potential waveforms, and the methods used for manual inspection of the spike sorting algorithm's output?

      This is an excellent point and we agree that the dynamic behavior used in this investigation creates potential new challenges for spike sorting. In our analysis, Kilosort 2.5 provides key advantages in comparing unit waveforms across multiple channels and in detecting overlapping spikes. We modified this version of Kilosort to construct unit waveform templates using only the channels within the same muscle (Chung et al., 2023), as clarified in the revised Methods section (see “Electromyography (EMG)”):

      “A total of 33 units were identified across all animals. Each unit’s isolation was verified by confirming that no more than 2% of inter-spike intervals violated a 1 ms refractory limit. Additionally, we manually reviewed cross-correlograms to ensure that each waveform was only reported as a single motor unit.”

      The Reviewer is correct that our ability to precisely measure a unit’s activity based on its waveform will depend on the relationship between the embedded electrode and the muscle geometry, which alters over the course of the stride. As a follow-up to the original text, we have included new analyses to characterize the waveform activity throughout the experiment and stride (also in Methods):

      “We further validated spike sorting by quantifying the stability of each unit’s waveform across time (Figure 1–figure supplement 1). First, we calculated the median waveform of each unit across every trial to capture long-term stability of motor unit waveforms. Additionally, we calculated the median waveform through the stride binned in 50 ms increments using spiking from a single trial. This second metric captures the stability of our spike sorting during the rapid changes in joint angles that occur during the burst of an individual motor unit. In doing so, we calculated each motor unit’s waveforms from the single channel in which that unit’s amplitude was largest and did not attempt to remove overlapping spikes from other units before measuring the median waveform from the data. We then calculated the correlation between a unit’s waveform over either trials or bins in which at least 30 spikes were present. The high correlation of a unit waveform over time, despite potential changes in the electrodes’ position relative to muscle geometry over the dynamic task, provides additional confidence in both the stability of our EMG recordings and the accuracy of our spike sorting.”

      We have included a supplementary to Figure 1 to highlight the effectiveness of our spike sorting.

      (2) Yield of the spike sorting pipeline and analyses per animal/muscle

      A total of 33 motor units were identified from two heads of the triceps in six mice (17 from the long head and 16 from the lateral head). However, precise information on the yield per muscle per animal is not provided. This information is crucial to support the novelty of the study, as the authors claim in the introduction that their electrode arrays enable the identification of populations of motor units. Beyond reporting the number of identified motor units, another way to demonstrate the effectiveness of the spike sorting algorithm would be to compare the recorded EMG signals with the residual signal obtained after subtracting the action potentials of the identified motor units, using a signal-to-residual ratio.

      Furthermore, motor units identified from the same muscle and the same animal are likely not independent due to common synaptic inputs. This dependence should be accounted for in the statistical analyses when comparing changes in motor unit properties across speeds and between muscles.

      We thank the Reviewer for this comment. Regarding motor unit yield, as described above the newly-added Table 1 displays the yield from each animal and muscle.

      Regarding spike sorting, while signal-to-residual is often an excellent metric, it is not ideal for our high-resolution EMG signals since isolated single motor units are typically superimposed on a “bulk” background consisting of the low-amplitude waveforms of other motor units. Because these smaller units typically cannot be sorted, it is challenging to estimate the “true” residual after subtracting (only) the largest motor unit, since subtracting each sorted unit’s waveform typically has a very small effect on the RMS of the total EMG signal. To further address concerns regarding spike sorting quality, we added Figure 1–figure supplement 1 that demonstrates motor units’ consistency over the experiment, highlighting that the waveform maintains its shape within each stride despite muscle/limb dynamics and other possible sources of electrical noise or artifact.

      Finally, the Reviewer is correct that individual motor units in the same muscle are very likely to receive common synaptic inputs. These common inputs may reflect in sparse motor units being recruited in overlapping rather than different strides. Indeed, in the following text added to the Results, we identified that motor units are recruited with higher probability when additional units are recruited.

      “Probabilistic recruitment is correlated across motor units

      Our results show that the recruitment of individual motor units is probabilistic even within a single speed quartile (Figure 5A-C) and predicts body movements (Figure 6), raising the question of whether the recruitment of individual motor units are correlated or independent. Correlated recruitment might reflect shared input onto the population of motor units innervating the muscle (De Luca, 1985; De Luca & Erim, 1994; Farina et al., 2014). For example, two motor units, each with low recruitment probabilities, may still fire during the same set of strides. To assess the independence of motor unit recruitment across the recorded population, we compared each unit’s empirical recruitment probability across all strides to its conditional recruitment probability during strides in which another motor unit from the same muscle was recruited (Figure 7). Doing this for all motor unit pairs revealed that motor units in both muscles were biased towards greater recruitment when additional units were active (p<0.001, Wilcoxon signed-rank tests for both the lateral and long heads of triceps). This finding suggests that probabilistic recruitment reflects common synaptic inputs that covary together across locomotor strides.”

      (3) Representativeness of the sample of identified motor units

      However, to draw such conclusions, the authors should exclusively compare motor units from the same pool and systematically track violations of the recruitment order. Alternatively, they could demonstrate that the motor units that are intermittently active across strides correspond to the smallest motor units, based on the assumption that these units should always be recruited due to their low activation thresholds.

      One way to estimate the size of motor units identified within the same muscle would be to compare the amplitude of their action potentials, assuming that all motor units are relatively close to the electrodes (given the selectivity of the recordings) and that motoneurons innervating more muscle fibres generate larger motor unit action potentials.

      We thank the Reviewer for this comment. Below, we provide more detailed analyses of the relationships between motor unit spike amplitude and the recruitment probability as well as latency (relative to stride onset) of activation.

      We generated Author response image 1 to illustrate the relationship between the amplitude of motor units and their firing properties. As suspected, units with larger-amplitude waveforms fired with lower probability and produced their first spikes later in the stride. If we were comfortable assuming that larger spike amplitudes mean higher-force units, then this would be consistent with a key prediction of the size principle (i.e. that higher-force units are recruited later). However, we are hesitant to base any conclusions on this assumption or emphasize this point with a main-text figure, since EMG signal amplitude may also vary due to the physical properties of the electrode and distance from muscle fibers. Thus it is possible that a large motor unit may have a smaller waveform amplitude relative to the rest of the motor pool.

      Author response image 1.

      Relation between motor unit amplitude and (A) recruitment probability and (B) mean first spike time within the stride. Colored lines indicate the outcome of linear regression analyses.

      Currently, the data seem to support the idea that motor units that are alternately recruited across strides have recruitment thresholds close to the level of activation or force produced during slow walking. The fact that recruitment probability monotonically increases with speed suggests that the force required to propel the mouse forward exceeds the recruitment threshold of these "large" motor units. This pattern would primarily reflect spatial recruitment following the size principle rather than flexible motor unit control.

      We thank the Reviewer for this comment. We agree with this interpretation, particularly in relation to the references suggested in later comments, and have added the following text to the Discussion to better reflect this argument:

      “To investigate the neuromuscular control of locomotor speed, we quantified speed-dependent changes in both motor unit recruitment and firing rate. We found that the majority of units were recruited more often and with larger firing rates at faster speeds (Figure 5, Figure5–figure supplement 1). This result may reflect speed-dependent differences in the common input received by populations of motor neurons with varying spiking thresholds (Henneman et al., 1965). In the case of mouse locomotion, faster speeds might reflect a larger common input, increasing the recruitment probability as more neurons, particularly those that are larger and generate more force, exceed threshold for action potentials (Farina et al., 2014).”

      (4)    Analysis of recruitment and firing rates

      The authors currently report active duration and peak firing rates based on spike trains convolved with a Gaussian kernel. Why not report the peak of the instantaneous firing rates estimated from the inverse of the inter-spike interval? This approach appears to be more aligned with previous studies conducted to describe motor unit behaviour during fast movements (e.g., Desmedt & Godaux, 1977, J Physiol; Van Cutsem et al., 1998, J Physiol; Del Vecchio et al., 2019, J Physiol).

      We thank the Reviewer for this comment. In the revised Discussion (see ‘Firing rates in mouse locomotion compared to other species’) we reference several examples of previous studies that quantified spike patterns based on the instantaneous firing rate. We chose to report the peak of the smoothed firing rate because that quantification includes strides with zero spikes or only one spike, which occur regularly in our dataset (and for which ISI rate measures, which require two spikes to define an instantaneous firing rate, cannot be computed). Regardless, in the revised Figure 4B, we present an analysis that uses inter-spike intervals as suggested, which yielded similar ranges of firing rates as the primary analysis.

      (5)    Additional analyses of behaviour

      The authors currently analyse motor unit recruitment in relation to elbow angle. It would be valuable to include a similar analysis using the angular velocity observed during each stride, re broadly, comparing stride-by-stride changes in firing rates with changes in elbow angular velocity would further strengthen the final analyses presented in the results section.

      We thank the Reviewer for this comment. To address this, we have modified Figure 6 and the associated Supplemental Figures, to show relationships in unit activation with both the range of elbow extension and the range of elbow velocity for each stride. These new Supplemental Figures show that the trends shown in main text Figure 6C and 6E (which show data from all speed quartiles on the same axes) are also apparent in both the slower and faster quartiles individually, although single-quartile statistical tests (with smaller sample size than the main analysis) not reach statistical significance in all cases.

      Reviewer #3 (Public review):

      Summary:

      Using the approach of Myomatrix recording, the authors report that:

      (1) Motor units are recruited differently in the two types of muscles.

      (2) Individual units are probabilistically recruited during the locomotion strides, whereas the population bulk EMG has a more reliable representation of the muscle.

      (3) The recruitment of units was proportional to walking speed.

      Strengths:

      The new technique provides a unique data set, and the data analysis is convincing and well-performed.

      We thank the Reviewer for the comment.

      Weaknesses:

      The implications of "probabilistical recruitment" should be explored, addressed, and analyzed further.

      Comments:

      One of the study's main findings (perhaps the main finding) is that the motor units are "probabilistically" recruited. The authors do not define what they mean by probabilistically recruited, nor do they present an alternative scenario to such recruitment or discuss why this would be interesting or surprising. However, on page 4, they do indicate that the recruitment of units from both muscles was only active in a subset of strides, i.e., they are not reliably active in every step.

      If probabilistic means irregular spiking, this is not new. Variability in spiking has been seen numerous times, for instance in human biceps brachii motor units during isometric contractions (Pascoe, Enoka, Exp physiology 2014) and elsewhere. Perhaps the distinction the authors are seeking is between fluctuation-driven and mean-driven spiking of motor units as previously identified in spinal motor networks (see Petersen and Berg, eLife 2016, and Berg, Frontiers 2017). Here, it was shown that a prominent regime of irregular spiking is present during rhythmic motor activity, which also manifests as a positive skewness in the spike count distribution (i.e., log-normal).

      We thank the Reviewer for this comment and have clarified several passages in response. The Reviewer is of course correct that irregular motor unit spiking has been described previously and may reflect motor neurons’ operating in a high-sensitivity (fluctuation-driven) regime. We now cite these papers in the Discussion (see ‘Firing rates in mouse locomotion compared to other species’). Additionally, the revision clarifies that “probabilistically” - as defined in our paper - refers only to the empirical observation that a motor unit spikes during only a subset of strides, either when all locomotor speeds are considered together (Figure 2) or separately (Figure 5A-C):

      “Motor units in both muscles exhibited this pattern of probabilistic recruitment (defined as a unit’s firing on only a fraction of strides), but with differing distributions of firing properties across the long and lateral heads (Figure 2).”

      “Our findings (Figure 4) highlight that even with the relatively high firing rates observed in mice, there are still significant changes in firing rate and recruitment probability across the spikes within bursts (Figure 4B) and across locomotor speeds (Figure 5F). Future studies should more carefully examine how these rapidly changing spiking patterns derive from both the statistics of synaptic inputs and intrinsic properties of motor neurons (Manuel & Heckman, 2011; Petersen & Berg, 2016; Berg, 2017).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      As mentioned above, there are several issues with the statistics that need to be corrected to properly support the claims made in the paper.

      The authors compare the fractions of MUs that show significant variation across locomotor speeds in their firing rate and recruitment probability. However, it is not statistically founded to compare the results of separate statistical tests based on different kinds of measurements and thus have unconstrained differences in statistical power. The comparison of the fractional changes in firing rates and recruitment across speeds that follow is helpful, though in truth, by contemporary standards, one would like to see error bars on these estimates. These could be generated using bootstrapping.

      The Reviewer is correct, and we have revised the manuscript to better clarify which quantities should or should not be compared, including the following passage (see “Motor unit mechanisms of speed control” in Results):

      “Speed-dependent increases in peak firing rate were therefore also present in our dataset, although in a smaller fraction of motor units (22/33) than changes in recruitment probability (31/33). Furthermore, the mean (± SE) magnitude of speed-dependent increases was smaller for spike rates (mean rate<sub>fast</sub>/rate<sub>slow</sub> of 111% ± 20% across all motor units) than for recruitment probabilities (mean p(recruitment)<sub>fast</sub>/p(recruitment)<sub>slow</sub> of 179% ± 3% across all motor units). While fractional changes in rate and recruitment probability are not readily comparable given their different upper limits, these findings could suggest that while both recruitment and peak rate change across speed quartiles, increased recruitment probability may play a larger role in driving changes in locomotor speed.”

      The description in the Methods of the tests for variation in firing rates and recruitment probability across speeds are extremely hard to understand - after reading many times, it is still not clear what was done, or why the method used was chosen. In the main text, the authors quote p-values and then state "bootstrap confidence intervals," which is not a statistical test that yields a p-value. While there are mathematical relationships between confidence intervals and statistical tests such that a one-to-one correspondence between them can exist, the descriptions provided fall short of specifying how they are related in the present instance. For this reason, and those described in what follows, it is not clear what the p-values represent.

      Next, the authors refer to fitting a model ("a Poisson distribution") to the data to estimate firing rate and recruitment probability, that the model results agree with their actual data, and that they then bootstrapped from the model estimates to get confidence intervals and compute p-values. Why do this? Why not just do something much simpler, like use the actual spike counts, and resample from those? I understand that it is hard to distinguish between no recruitment and just no spikes given some low Poisson firing rate, but how does that challenge the ability to test if the firing rates or the number of spiking MUs changes significantly across speeds? I can come up with some reasons why I think the authors might have decided to do this, but reasoning like this really should be made explicit.

      In addition, the authors would provide an unambiguous description of the model, perhaps using an equation and a description of how it was fit. For the bootstrapping, a clear description of how the resampling was done should be included. The focus on peak firing rate instead of mean (or median) firing rate should also be justified. Since peaks are noisier, I would expect the statistical power to be lower compared to using the mean or median.

      We thank the Reviewer for the comments and have revised and expanded our discussion of the statistical tests employed. We expanded and clarified our description of these techniques in the updated Methods section:

      “Joint model of rate and recruitment

      We modeled the recruitment probability and firing rate based on empirical data to best characterize firing statistics within the stride. Particularly, this allowed for multiple solutions to explain why a motor unit would not spike within a stride. From the empirical data alone, strides with zero spikes would have been assumed to have no recruitment of a unit. However, to create a model of motor unit activity that includes both recruitment and rate, it must be possible that a recruited unit can have a firing rate of zero. To quantify the firing statistics that best represent all spiking and non-spiking patterns, we modeled recruitment probability and peak firing rate along the following piecewise function:

      Eq. 1:

      Eq. 2:

      where y denotes the observed peak firing rate on a given stride (determined by convolving motor unit spike times with a Gaussian kernel as described above), p denotes the probability of recruitment, and λ denotes the expected peak firing rate from a Poisson distribution of outcomes. Thus, an inactive unit on a given stride may be the result of either non-recruitment or recruitment with a stochastically zero firing rate. The above equations were fit by minimizing the negative log-likelihood of the parameters given the data.”

      “Permutation test for joint model of rate and recruitment and type 2 regression slopes

      To quantify differences in firing patterns across walking speeds, we subdivided each mouse’s total set of strides into speed quartiles and calculated rate (𝜆, Eq. 1 and 2, Fig. 5A-C) and recruitment probability terms (p, Eq. 1 and 2, Fig. 5D-F) for each unit in each speed quartile. Here we calculated the difference in both the rate and recruitment terms across the fastest and slowest speed quartiles (p<sub>fast</sub>-p<sub>slow</sub> and 𝜆<sub>fast</sub>-𝜆<sub>slow</sub>). To test whether these model parameters were significantly different depending on locomotor speed, we developed a null model combining strides from both the fastest and slowest speed quartiles. After pooling strides from both quartiles, we randomly distributed the pooled set of strides into two groups with sample sizes equal to the original slow and fast quartiles. We then calculated the null model parameters for each new group and found the difference between like terms. To estimate the distribution of possible differences, we bootstrapped this result using 1000 random redistributions of the pooled set of strides. Following the permutation test, the 95% confidence interval of this final distribution reflects the null hypothesis of no difference between groups. Thus, the null hypothesis can be rejected if the true difference in rate or recruitment terms exceeds this confidence interval.

      We followed a similar procedure to quantify cross-muscle differences in the relationship between firing parameters. For each muscle, we estimated the slope across firing parameters for each motor unit using type 2 regression. In this case, the true difference was the difference in slopes between muscles. To test the null hypothesis that there was no difference in slopes, the null model reflected the pooled set of units from both muscles. Again, slopes were calculated for 1000 random resamplings of this pooled data to estimate the 95% confidence interval.”

      The argument for delayed activation of the lateral head is interesting, but I am not comfortable saying the nervous system creates a delay just based on observations of the mean time of the first spike, given the potential for differential variability in spike timing across muscles and MUs. One way to make a strong case for a delay would be to show aggregate PSTHs for all the spikes from all the MUs for each of the two heads. That would distinguish between a true delay and more gradual or variable activation between the heads.

      This is a good point and we agree that the claim made about the nervous system is too strong given the results. Even with Author response image 2 that the Reviewer suggested, there is still not enough evidence to isolate the role of the nervous system in the muscles’ activation.

      Author response image 2.

      Aggregate peristimulus time histogram (PSTH) for all motor unit spike times in the long head (top) and lateral head (bottom) within the stride.

      In the ideal case, we would have more simultaneous recordings from both muscles to make a more direct claim on the delay. Still, within the current scope of the paper, to correct this and better describe the difference in timing of muscle activity, we edited the text to the following:

      “These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, the motor pool for the long head becomes active roughly 100 ms before the motor pool supplying the lateral head during locomotion (Figure 3C).”

      The results from Marshall et al. 2022 suggest that the recruitment of some MUs is not just related to muscle force, but also the frequency of force variation - some of their MUs appear to be recruited only at certain frequencies. Figure 5C could have shown signs of this, but it does not appear to. We do not really know the force or its frequency of variation in the measurements here. I wonder whether there is additional analysis that could address whether frequency-dependent recruitment is present. It may not be addressable with the current data set, but this could be a fruitful direction to explore in the future with MU recordings from mice.

      We agree that this would be a fruitful direction to explore, however the Reviewer is correct that this is not easily addressable with the dataset. As the Reviewer points out, stride frequency increases with increased speed, potentially offering the opportunity to examine how motor unit activity varies with the frequency, phase, and amplitude of locomotor movements. However, given our lack of force data (either joint torques or ground reaction forces), dissociating the frequency/phase/amplitude of skeletal kinematics from the frequency/phase/amplitude of muscle force. Marshall et al. (2022) mitigated these issues by using an isometric force-production task (Marshall et al., 2022). Therefore, while we agree that it would be a major contribution to extend such investigations to whole-body movements like locomotion, given the complexities described above we believe this is a project for the future, and beyond the scope of the present study.

      Minor:

      Page 5: "Units often displayed no recruitment in a greater proportion of strides than for any particular spike count when recruited (Figures 2A, B)," - I had to read this several times to understand it. I suggest rephrasing for clarity.

      We have changed the text to read:

      “Units demonstrated a variety of firing patterns, with some units producing 0 spikes more frequently than any non-zero spike count (Figure 2A, B),...”

      Figure 3 legend: "Mean phase ({plus minus} SE) of motor unit burst duration across all strides.": It is unclear what this means - durations are not usually described as having a phase. Do we mean the onset phase?

      We have changed the text to read:

      “Mean phase ± SE of motor unit burst activity within each stride”

      Page 9: "suggesting that the recruitment of individual motor units in the lateral and long heads might have significant (and opposite) effects on elbow angle in strides of similar speed (see Discussion)." I wouldn't say "opposite" here - that makes it sound like the authors are calling the long head a flexor. The authors should rephrase or clarify the sense in which they are opposite.

      This is a fair point and we agree we should not describe the muscles as ‘opposite’ when both muscles are extensors. We have removed the phrase ‘and opposite’ from the text.

      Page 11: "in these two muscles across in other quadrupedal species" - typo.

      We have corrected this error.

      Page 16: This reviewer cannot decipher after repeated attempts what the first two sentences of the last paragraph mean. - “Future studies might also use perturbations of muscle activity to dissociate the causal properties of each motor unit’s activity from the complex correlation structure of locomotion. Despite the strong correlations observed between motor unit recruitment and limb kinematics (Fig. 6, Supplemental Fig. 3), these results might reflect covariations of both factors with locomotor speed rather than the causal properties of the recorded motor unit.”

      For better clarity, we have changed the text to read:

      “Although strong correlations were observed between motor unit recruitment and limb kinematics during locomotion (Figure 6, Figure 6–figure supplement 1), it remains unclear whether such correlations actually reflect the causal contributions that those units make to limb movement. To resolve this ambiguity, future studies could use electrical or optical perturbations of muscle contraction levels (Kim et al., 2024; Lu et al., 2024; Srivastava et al., 2015, 2017) to test directly how motor unit firing patterns shape locomotor movements.The short-latency effects of patterned motor unit stimulation (Srivastava et al., 2017) could then reveal the sensitivity of behavior to changes in muscle spiking and the extent to which the same behaviors can be performed with many different motor commands.”

      Reviewer #2 (Recommendations for the authors):

      Minor comments:

      Introduction:

      (1) "Although studies in primates, cats, and zebrafish have shown that both the number of active motor units and motor unit firing rates increase at faster locomotor speeds (Grimby, 1984; Hoffer et al., 1981, 1987; Marshall et al., 2022; Menelaou & McLean, 2012)." I would remove Marshall et al. (2022) as their monkeys performed pulling tasks with the upper limb. You can alternatively remove locomotor from the sentence and replace it with contraction speed.

      Thank you for the comment. While we intended to reference this specific paper to highlight the rhythmic activity in muscles, we agree that this deviates from ‘locomotion’ as it is referenced in the other cited papers which study body movement. We have followed the Reviewer’s suggestion to remove the citation to Marshall et al.

      (2) "The capability and need for faster force generation during dynamic behavior could implicate motor unit recruitment as a primary mechanism for modulating force output in mice."

      The authors could add citations to this sentence, of works that showed that recruitment speed is the main determinant of the rate of force development (see for example Dideriksen et al. (2020) J Neurophysiol; J. L. Dideriksen, A. Del Vecchio, D. Farina, Neural and muscular determinants of maximal rate of force development. J Neurophysiol 123, 149-157 (2020)).

      Thank you for pointing out this important reference. We have included this as a citation as recommended.

      Results:

      (3) "Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in the triceps brachii (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units (Figure 1E) as described previously (Chung et al., 2023)."

      This sentence can be misleading for the reader as the array used by the researchers has 4 threads of 8 electrodes. Would it be possible to specify the number of electrodes implanted per head of interest? I assume 8 per head in most mice (or 4 bipolar channels), even if that's not specifically written in the manuscript.

      Thank you for the suggestion. As described above, we have added Table 1, which includes all array locations, and we edited the statement referenced in the comment as follows:

      “Electrode arrays (32-electrode Myomatrix array model RF-4x8-BHS-5) were implanted in forelimb muscles (note that Figure 1D shows the EMG signal from only one of the 16 bipolar recording channels), and the resulting data were used to identify the spike times of individual motor units in the triceps brachii long and lateral heads (Table 1, Figure 1E) as described previously (Chung et al., 2023).“

      (4) "These findings demonstrate that despite the overlapping biomechanical functions of the long and lateral heads of the triceps, the nervous system creates a consistent, approximately 100 ms delay (Figure 3C) between the activation of the two muscles' motor neuron pools. This timing difference suggests distinct patterns of synaptic input onto motor neurons innervating the lateral and long heads."

      Both muscles don't have fully overlapping biomechanical functions, as one of them also acts on the shoulder joint. Please be more specific in this sentence, saying that both muscles are synergistic at the elbow level rather than "have overlapping biomechanical functions".

      We agree with the above reasoning and that our manuscript should be clearer on this point. We edited the above text in accordance with the Reviewer suggestion as follows:

      "These findings demonstrate that despite the synergistic (extensor) function of the long and lateral heads of the triceps at the elbow, …”

      (5) "Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role."

      It is difficult to draw such an affirmative conclusion on the synaptic inputs from the data presented by the authors. The differences in firing rates may solely arise from other factors than distinct synaptic inputs, such as the different intrinsic properties of the motoneurons or the reception of distinct neuromodulatory inputs.

      To better explain our findings, we adjusted the above text in the Results (see “Motor unit firing patterns in the long and lateral heads of the triceps”):

      “Together with the differences in burst timing shown in Figure 3B, these results again suggest that the motor pools for the lateral and long heads of the triceps receive distinct patterns of synaptic input, although differences in the intrinsic physiological properties of motor neurons innervating the two muscles might also play an important role.”

      We also included the following distinction in the Discussion (see “Differences in motor unit activity patterns across two elbow extensors”) to address the other plausible mechanisms mentioned.

      “The large differences in burst timing and spike patterning across the muscle heads suggest that the motor pools for each muscle receive distinct inputs. However, differences in the intrinsic physiological properties of motor units and neuromodulatory inputs across motor pools might also make substantial contributions to the structure of motor unit spike patterns (Martínez-Silva et al., 2018; Miles & Sillar, 2011).”

      (6) "We next examined whether the probabilistic recruitment of individual motor units in the triceps and elbow extensor muscle predicted stride-by-stride variations in elbow angle kinematics."

      I'm not sure that the wording is appropriate here. The analysis does not predict elbow angle variations from parameters extracted from the spiking activity. It rather compares the average elbow angle between two conditions (motor unit active or not active).

      We thank the Reviewer for this comment and agree that the wording could be improved here to better reflect our analysis. To lower the strength of our claim, we replaced usage of the word

      ‘predict’ with ‘correlates’ in the above text and throughout the paper when discussing this result.

      Methods:

      (7) "Using the four threads on the customizable Myomatrix array (RF-4x8-BHS-5), we implanted a combination of muscles in each mouse, sometimes using multiple threads within the same muscle. [...] Some mice also had threads simultaneously implanted in their ipsilateral or contralateral biceps brachii although no data from the biceps is presented in this study."

      A precise description of the localisation of the array (muscles and the number of arrays per muscle) for each animal would be appreciated.

      (8) "A total of 33 units were identified and manually verified across all animals." A precise description of the number of motor units concurrently identified per muscle and per animal would be appreciated. Moreover, please add details on the manual inspection. Does it involve the manual selection of missing spikes? What are the criteria for considering an identified motor unit as valid?

      As discussed earlier, we added Table 1 to the main text to provide the details mentioned in the above comments.

      Regarding spike sorting, given the very large number of spikes recorded, we did not rely on manual adjusting mislabeled spikes. Instead, as described in the revised Methods section, we verified unit isolation by ensuring units had >98% of spikes outside of 1ms of each other. Moreover, as described above we have added new analyses (Figure 1–figure supplement 1) confirming the stability of motor unit waveforms across both the duration of individual recording sessions (roughly 30 minutes) and across the rapid changes in limb position within individual stride cycles (roughly 250 msec).

      Reviewer #3 (Recommendations for the authors):

      Figure 2 (and supplement) show spike count distributions with strong positive skewness, which is in accordance with the prediction of a fluctuation-driven regime. I suggest plotting these on a logarithmic x-axis (in addition to the linear axis), which should reveal a bell-shaped distribution, maybe even Gaussian, in a majority of the units.

      We thank the Reviewer for the suggestion. We present the requested analysis (Author response image 3), which shows bell-shaped distributions for some (but not all) distributions. However, we believe that investigating why some replotted distributions are Gaussian and others are not falls beyond the scope of this paper, and likely requires a larger dataset than the one we were able to obtain.

      Author response image 3.

      Spike count distributions for each motor unit on a logarithmic x-axis.

      Why not more data? I tried to get an overview of how much data was collected.

      Supplemental Figure 1 has all the isolated units, which amounts to 38 (are the colors the two muscle types?). Given there are 16 leads in each myomatrix, in two muscles, of six mice, this seems like a low yield. Could the authors comment on the reasons for this low yield?

      Regarding motor unit yield, even with multiple electrodes per muscle and a robust sorting algorithm, we often isolated only a few units per muscle. This yield likely reflects two factors. First, because of the highly dynamic nature of locomotion and high levels of muscle contraction, isolating individual spikes reliably across different locomotor speeds is inherently challenging, regardless of the algorithm being employed. Second, because the results of spike-train analyses can be highly sensitive to sorting errors, we have only included the motor units that we can sort with the highest possible confidence across thousands of strides.

      Minor:

      Figure captions especially Figure 6: The text is excessively long. Can the text be shortened?

      We thank the Reviewer for this comment. Generally, we seek to include a description of the methods and results within the figure captions, but we concede that we can condense the information in some cases. In a number of cases, we have moved some of the descriptive text from the caption to the Methods section.

      References

      Berg, R. W. (2017). Neuronal Population Activity in Spinal Motor Circuits: Greater Than the Sum of Its Parts. Frontiers in Neural Circuits, 11. https://doi.org/10.3389/fncir.2017.00103

      Biewener, A. A., Blickhan, R., Perry, A. K., Heglund, N. C., & Taylor, C. R. (1988). Muscle Forces During Locomotion in Kangaroo Rats: Force Platform and Tendon Buckle Measurements Compared. Journal of Experimental Biology, 137(1), 191–205. https://doi.org/10.1242/jeb.137.1.191

      Chung, B., Zia, M., Thomas, K. A., Michaels, J. A., Jacob, A., Pack, A., Williams, M. J., Nagapudi, K., Teng, L. H., Arrambide, E., Ouellette, L., Oey, N., Gibbs, R., Anschutz, P., Lu, J., Wu, Y., Kashefi, M., Oya, T., Kersten, R., … Sober, S. J. (2023). Myomatrix arrays for high-definition muscle recording. eLife, 12, RP88551. https://doi.org/10.7554/eLife.88551

      De Luca, C. J. (1985). Control properties of motor units. Journal of Experimental Biology, 115(1), 125–136. https://doi.org/10.1242/jeb.115.1.125

      De Luca, C. J., & Erim, Z. (1994). Common drive of motor units in regulation of muscle force. Trends in Neurosciences, 17(7), 299–305. https://doi.org/10.1016/0166-2236(94)90064-7

      Farina, D., Negro, F., & Dideriksen, J. L. (2014). The effective neural drive to muscles is the common synaptic input to motor neurons. The Journal of Physiology, 592(16), 3427–3441. https://doi.org/10.1113/jphysiol.2014.273581

      Hartigan, P. M. (1985). Algorithm AS 217: Computation of the Dip Statistic to Test for Unimodality. Applied Statistics, 34(3), 320. https://doi.org/10.2307/2347485

      Henneman, E., Somjen, G., & Carpenter, D. O. (1965). FUNCTIONAL SIGNIFICANCE OF CELL SIZE IN SPINAL MOTONEURONS. Journal of Neurophysiology, 28(3), 560–580. https://doi.org/10.1152/jn.1965.28.3.560

      Karabulut, D., Dogru, S. C., Lin, Y.-C., Pandy, M. G., Herzog, W., & Arslan, Y. Z. (2020). Direct Validation of Model-Predicted Muscle Forces in the Cat Hindlimb During Locomotion. Journal of Biomechanical Engineering, 142(5), 051014. https://doi.org/10.1115/1.4045660

      Kim, J. J., Wyche, I. S., Olson, W., Lu, J., Bakir, M. S., Sober, S. J., & O’Connor, D. H. (2024). Myo-optogenetics: Optogenetic stimulation and electrical recording in skeletal muscles. https://doi.org/10.1101/2024.06.21.600113

      Lu, J., Zia, M., Baig, D. A., Yan, G., Kim, J. J., Nagapudi, K., Anschutz, P., Oh, S., O’Connor, D., Sober, S. J., & Bakir, M. S. (2024). Opto-Myomatrix: μLED integrated microelectrode arrays for optogenetic activation and electrical recording in muscle tissue. https://doi.org/10.1101/2024.07.01.601601

      Manuel, M., & Heckman, C. J. (2011). Adult mouse motor units develop almost all of their force in the subprimary range: A new all-or-none strategy for force recruitment? Journal of Neuroscience, 31(42), 15188–15194. https://doi.org/10.1523/JNEUROSCI.2893-11.2011

      Marshall, N. J., Glaser, J. I., Trautmann, E. M., Amematsro, E. A., Perkins, S. M., Shadlen, M. N., Abbott, L. F., Cunningham, J. P., & Churchland, M. M. (2022). Flexible neural control of motor units. Nature Neuroscience, 25(11), 1492–1504. https://doi.org/10.1038/s41593-022-01165-8

      Martínez-Silva, M. de L., Imhoff-Manuel, R. D., Sharma, A., Heckman, C. J., Shneider, N. A., Roselli, F., Zytnicki, D., & Manuel, M. (2018). Hypoexcitability precedes denervation in the large fast-contracting motor units in two unrelated mouse models of ALS. eLife, 7(2007), 1–26. https://doi.org/10.7554/eLife.30955

      Miles, G. B., & Sillar, K. T. (2011). Neuromodulation of Vertebrate Locomotor Control Networks. Physiology, 26(6), 393–411. https://doi.org/10.1152/physiol.00013.2011

      Petersen, P. C., & Berg, R. W. (2016). Lognormal firing rate distribution reveals prominent fluctuation–driven regime in spinal motor networks. eLife, 5. https://doi.org/10.7554/elife.18805

      Srivastava, K. H., Elemans, C. P. H., & Sober, S. J. (2015). Multifunctional and Context-Dependent Control of Vocal Acoustics by Individual Muscles. The Journal of Neuroscience, 35(42), 14183–14194. https://doi.org/10.1523/JNEUROSCI.3610-14.2015

      Srivastava, K. H., Holmes, C. M., Vellema, M., Pack, A. R., Elemans, C. P. H., Nemenman, I., & Sober, S. J. (2017). Motor control by precisely timed spike patterns. Proceedings of the National Academy of Sciences of the United States of America, 114(5), 1171–1176. https://doi.org/10.1073/pnas.1611734114

    1. eLife Assessment

      The authors use single molecule imaging and in vivo loop-capture genomic approaches to investigate estrogen mediated enhancer-target gene activation in human cancer cells. These potentially important results suggest that ER-alpha can, in a temporal delay, activate a non-target gene TFF3, which is in proximity to the main target gene TFF1, even though the estrogen responsive enhancer does not loop with the TFF3 promoter. To explain these results, the authors invoke a transcriptional condensate model. The claim of a temporal delay and effects of the target gene transcription on the non-target gene expression are supported by solid evidence but there is no direct evidence of the role of a condensate in mediating this effect. The reviewers appreciate that the authors have done a lot of work to strengthen the study. This work will be of interest to those studying transcriptional gene regulation and hormone-aggravated cancers.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But the authors have greatly improved the manuscript during the revision work.

      Comments on latest version:

      The authors have done a lot of work for the revision. The manuscript has been greatly improved.

    3. Reviewer #3 (Public review):

      Summary:

      In this manuscript Bohra et al. measure the effects of estrogen responsive gene expression upon induction on nearby target genes using a TAD containing the genes TFF1 and TFF3 as a model. The authors propose that there is a sort competition for transcriptional machinery between TFF1 (estrogen responsive) and TFF3 (not responsive) such that when TFF1 is activated and machinery is recruited, TFF3 is activated after a time delay. The authors attribute this time delay to transcriptional machinery that was being sequestered at TFF1 becomes available to the proximal TFF3 locus. The authors demonstrate that this activation is not dependent on contact with the TFF1 enhancer through deletion, instead they conclude that it is dependent on a phase-separated condensate which can sequester transcriptional machinery. Although the manuscript reports an interesting observation that there is a dose dependence and time delay on the expression of TFF1 relative to TFF3, there is much room for improvement in the analysis and reporting of the data. Most importantly there is no direct test of condensate formation at the locus in the context of this study: i.e. dissolution upon the enhancer deletion, decay in a temporal manner, and dependence of TFF1 expression on condensate formation. Using 1,6' hexanediol to draw conclusion on this matter is not adequate to draw conclusions on the effect of condensates on a specific genes activity given current knowledge on its non-specificity and multitude of indirect effects. Thus, in my opinion the major claim that this effect of a time delayed expression of TFF3 being dependent on condensates in not supported by the current data.

      Strengths:

      The depends of TFF1 expression on a single enhancer and the temporal delay in TFF3 is a very interesting finding.

      The non-linear dependence of TFF1 and TTF3 expression on ER concentration is very interesting with potentially broader implications.

      The combined use of smFISH, enhancer deletion, and 4C to build a coherent model is a good approach.

      Weaknesses:

      There is no direct observation of a condensate at the TFF1 and TFF3 locus and how this condensate changes over time after E2 treatment, upon enhancer deletion, whether transcriptional machinery is indeed concentrated within it, and other claims on condensate function and formation made in the manuscript. The use of 1,6' HD is not appropriate to test this idea given how broadly it acts.

      Comments on latest version:

      I don't think the response to Reviewer 2's comment on LLPS condensates on TFF1 are adequate and given this point is essential to the claims of the manuscript they must be addressed. Namely, the data from Saravavanan, 2020 actually suggest that condensate formation at the locus is not very predictive and barely enriched over random spots. The claims in the manuscript on the dependence of the condensate being responsible for sequestering transcriptional machinery are quite strong and the crux of the current model. To continue to make this claim (which I don't think is necessary since there are other possible models) the authors must test if the condensate at his locus (1) shows time dependent behavior, (2) is not present or weakened at the locus in cells that show high TFF3 expression, (3) is indeed enriched for transcriptional machinery when TFF1 peaks. The use of 1,6 hexanediol is not appropriate as pointed out by reviewer 2 and is no longer considered as an appropriate experiment by many as the whole notion of LLPS forming nuclear condensates is now under question. Such condensates can form through a variety of mechanisms as reviewed for example by Mittaj and Pappu (A conceptual framework for understanding phase separation and addressing open questions and challenges, Molecular Cell, 2022). Furthermore, given the distance between TFF1 and TFF3 it is hard to imagine that if a condensate that concentrates machinery in a non-stoichiometric manner was forming how it would not boost expression on both genes and be just specific to one. There must be another mechanism in my opinion.

      I would recommend the authors remove this aspect of their manuscript/model and simply report their interesting findings that are actually supported by data: The temporal delay of TFF3 expression, the dependence on ER concentration, and the enhancer dependence.

    4. Author response:

      The following is the authors’ response to the current reviews.

      We are pleased that Reviewer 3 appreciated our findings and found the temporal lag between the expression of TFF1 and TFF3 during signaling particularly interesting. The reviewer also advised us not to overemphasize that this lag arises from phase separation of ERα at the TFF1 locus, as the use of 1,6-hexanediol alone is not sufficient to conclusively establish whether ERα condensates undergo liquid–liquid phase separation. We agree with this assessment and have revised the manuscript accordingly. Specifically, we have modified the title to remove reference to phase separation and have updated the text throughout the manuscript to avoid claiming that the observed condensates are a result of phase separation. The revised title is: “Ligand-dependent Enhancer Activation Indirectly Modulates Non-target Promoters in a Chromatin Domain.”

      With these changes, we are proceeding with the Version of Record using revised version of the manuscript.

      ———

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      The manuscript by Bohra et al. describes the indirect effects of ligand-dependent gene activation on neighboring non-target genes. The authors utilized single-molecule RNA-FISH (targeting both mature and intronic regions), 4C-seq, and enhancer deletions to demonstrate that the non-enhancer-targeted gene TFF3, located in the same TAD as the target gene TFF1, alters its expression when TFF1 expression declines at the end of the estrogen signaling peak. Since the enhancer does not loop with TFF3, the authors conclude that mechanisms other than estrogen receptor or enhancer-driven induction are responsible for TFF3 expression. Moreover, ERα intensity correlations show that both high and low levels of ERα are unfavorable for TFF1 expression. The ERa level correlations are further supported by overexpression of GFP-ERa. The authors conclude that transcriptional machinery used by TFF1 for its acute activation can negatively impact the TFF3 at peak of signaling but once, the condensate dissolves, TFF3 benefits from it for its low expression.

      Strengths:

      The findings are indeed intriguing. The authors have maintained appropriate experimental controls, and their conclusions are well-supported by the data.

      Weaknesses:

      There are some major and minor concerns that related to approach, data presentation and discussion. But I think they can be fixed with more efforts.

      We thank the reviewer for their positive comments on the paper. We have addressed all their specific recommendations below.  

      The deletion of enhancer reveals the absolute reliance of TFF1 on its enhancers for its expression. Authors should elaborate more on this as this is an important finding.

      We thank the reviewer for the comment. We have now added a more detailed discussion on the requirement of enhancer for TFF1 expression in the revised manuscript (line 368-385).  

      In Fig. 1, TFF3 expression is shown to be induced upon E2 signaling through qRT-PCR, while smFISH does not display a similar pattern. The authors attribute this discrepancy to the overall low expression of TFF3. In my opinion, this argument could be further supported by relevant literature, if available. Additionally, does GRO-seq data reveal any changes in TFF3 expression following estrogen stimulation? The GRO-seq track shown in Fig.1 should be adjusted to TFF3 expression to appreciate its expression changes.

      We have now included a browser shot image of TFF3 region showing GRO-Seq signal at E2 time course (Fig. S1C). We observed an increased transcription towards the 3’ end of TFF3 gene body at 3h.  The increased transcription at 3h, corroborates with smFISH data. The relative changes of TFF3 expression measured by qRT-PCR and smFISH for intronic transcripts are somewhat different, we speculate that such biased measurements that are dependent on PCR amplifications could be more for genes that express at low levels and smFISH using intronic probes may be a more sensitive assay to detect such changes.    

      Since the mutually exclusive relationship between TFF1 and TFF3 is based on snap shots in fixed cells, can authors comment on whether the same cell that expresses TFF1 at 1h, expresses TFF3 at 3h? Perhaps, the calculations taking total number of cells that express these genes at 1 and 3h would be useful.

      Like pointed out by the reviewer, since these are fixed cells, we cannot comment on the fate of the same cell at two time points. To further address this limitation, future work could employ cells with endogenous tags for TFF1 and TFF3 and utilize live cell imaging techniques. In a fixed cell assay, as the reviewer suggests, it can be investigated whether a similar fraction shows high TFF3 expression at 3h, as the fraction that shows high TFF1 expression at 1 h. To quantify the fractions as suggested by the reviewer, we plotted the fraction of cells showing high TFF1 and TFF3 expression at 1h and 3h. We identify truly high expressing cells by taking mean and one standard deviation (for single cell level data) at E2-1hr as the threshold for TFF1 (80 and above transcript counts) and mean and one standard deviation (for single cell level data) at E2-3hr as the threshold for TFF3 (36 and above transcript counts). The fraction with high TFF1 expression at 1h  (12.06 ± 2.1) is indeed comparable to that with high TFF3 expression at 3h (12.50 ± 2.0) (Fig. 2C and Author response image 1). We should note that if the transcript counts were normally distributed, a predetermined fraction would be expected to be above these thresholds and comparable fractions can arise just from underlying statistics. But in our experiments, this is unlikely to be the case given the many outliers that affect both the mean and the standard deviation, and the lack of normality and high dispersion in single cell distributions. Of course, despite the fractions being comparable, we cannot be certain if it is the same set of cells that go from high expression of TFF1 to high expression of TFF3, but definitely that is a possibility. We thank the reviewer for pointing out this comparison.

      Author response image 1.

      The graph represents the percent of cells that show high expression for TFF1 and TFF3 at 1h and 3h post E2 signaling. The threshold was collected by pooling in absolute RNA counts from 650 analyzed cells (as in Fig. 2C). The mean and standard deviation over single cell data were calculated. Mean plus one standard deviation was used to set the threshold for identifying high expressing cells. For TFF1, as it maximally expresses at 1h the threshold used was 80. For TFF3, as it maximally expresses at 3h the threshold used was 36. Fraction of cells expressing above 80 and 36 for TFF1 and TFF3 respectively were calculated from three different repeats. Mean of means and standard deviations from the three experiments are plotted here.

      Authors conclude that TFF3 is not directly regulated by enhancer or estrogen receptor. Does ERa bind on TFF3 promoter? 

      The ERa ChIP-seq performed at 1h and 3h of signaling suggests that TFF3 promoter is not bound by ERa as shown in supplementary Fig. 1B and S1B. However, one peak upstream to TFF1 promoter is visible and that is lost at 3h. 

      Minor comments:

      Reviewer’s comment -The figures would benefit from resizing of panels. There is very little space between the panels.

      We have now resized the figures in the revised manuscript.

      The discussion section could include an extrapolation on the relationship between ERα concentration and transcriptional regulation. Given that ERα levels have been shown to play a critical role in breast cancer, exploring how varying concentrations of ERα affect gene expression, including the differential regulation of target and non-target genes, would provide valuable insights into the broader implications of this study.

      This is a very important point that was missing from the manuscript. We have included this in the discussion in the revised manuscript (line 426-430).

      Reviewer #2:

      Summary:

      In this manuscript by Bohra et al., the authors use the well-established estrogen response in MCF7 cells to interrogate the role of genome architecture, enhancers, and estrogen receptor concentration in transcriptional regulation. They propose there is competition between the genes TFF1 and TFF3 which is mediated by transcriptional condensates. This reviewer does not find these claims persuasive as presented. Moreover, the results are not placed in the context of current knowledge.

      Strengths:

      High level of ERalpha expression seems to diminish the transcriptional response. Thus, the results in Fig. 4 have potential insight into ER-mediated transcription. Yet, this observation is not pursued in great depth however, for example with mutagenesis of ERalpha. However, this phenomenon - which falls under the general description of non monotonic dose response - is treated at great depth in the literature (i.e. PMID: 22419778). For example, the result the authors describe in Fig. 4 has been reported and in fact mathematically modeled in PMID 23134774. One possible avenue for improving this paper would be to dig into this result at the single-cell level using deletion mutants of ERalpha or by perturbing co-activators.

      We thank the reviewer for pointing us to the relevant literature on our observation which will enhance the manuscript. We have discussed these findings in relations to ours in the discussion section (Line 400-413). We thank the reviewer for insight on non-monotonic behavior.

      Weaknesses:

      There are concerns with the sm-RNA FISH experiments. It is highly unusual to see so much intronic signal away from the site of transcription (Fig. 2) (PMID: 27932455, 30554876), which suggests to me the authors are carrying out incorrect thresholding or have a substantial amount of labelling background. The Cote paper cited in the manuscript is likewise inconsistent with their findings and is cited in a misleading manner: they see splicing within a very small region away from the site of transcription. 

      We thank the reviewer for this comment, and apologize if they feel we misrepresented the argument from Cote et al. This has now been rectified in the manuscript. However, we do not agree that the intronic signals away from the site of transcription are an artefact. First, the images presented here are just representative 2D projections of 3D Z-stacks; whereas the full 3D stack is used for spot counting using a widely-used algorithm that reports spot counts that are constant over wide range of thresholds (Raj et al., 2008). The veracity of automated counts was first verified initially by comparison to manual counts. Even for the 2D representations the extragenic intronic signals show up at similar thresholds to the transcription sites. 

      The signal is not non-specific arising from background labeling, explained by following reasons:

      • To further support the time-course smFISH data and its interpretation without depending on the dispersed intronic signal, we have analyzed the number of alleles firing/site of transcription at a given time in a cell under the three conditions. We counted the sites of transcription in a given cell and calculated the percentage of cells showing 1,2,3,4 or >4 sites. We see that the percent of cells showing a single site of transcription for TFF1 is very high in uninduced cells and this decreases at 1h. At 1h, the cells showing 2, 3 and 4 sites of transcription increase which again goes down at 3h (Author response image 2A). This agrees with the interpretation made from mean intronic counts away from the site of transcription. Similarly, for TFF3, the number of cells showing 2,3 and 4 sites of transcription increase slightly at 3hr compared to uninduced and 1hr (Author response image 2B).  We can also see that several cells have no alleles firing at a given time as has been quantified in the graphs on right showing total fraction of cells with zero versus non-zero alleles firing (Author response image 2A-B). A non-specific signal would be present in all cells.

      • There is literature on post-transcriptional splicing of RNA beyond our work, which suggests that intronic signal can be found at relatively large distances away from the site of transcription. Waks et al. showed that some fraction of unspliced RNA could be observed up to 6-10 microns away from the site of transcription suggesting that there can be a delay between transcription and (alternative) splicing (Waks et al., 2011). Pannuclear disperse intronic signals can arise as there can be more than one allele firing at a time in different nuclear locations. The spread of intronic transcripts in our images is also limited in cells in which only 1 allele is firing at E2-1 hour (Author response image 2C) or uninduced cells (Author response image 2D). Furthermore, Cote et al. discuss that “Of note, we see that increased transcription level correlates with intron dispersal, suggesting that the percentage of splicing occurring away from the transcription site is regulated by transcription level for at least some introns. This may explain why we observe posttranscriptional splicing of all genes we measured, as all were highly expressed.” This is in line with our interpretation that intron signal dispersal can occur in case of posttranscriptional splicing (Coté et al., 2023). Additionally, other studies have suggested that transcripts in cells do not necessarily undergo co-transcriptional splicing which leads us to conclude that intronic signal can be found farther away from the site of transcription. Coulon et al. showed that splicing can occur after transcript release from the site and suggested that no strict checkpoint exists to ensure intron removal before release which results in splicing and release being kinetically uncoupled from each other (Coulon et al., 2014). Similarly, using live-cell imaging, it was shown that splicing is not always coupled with transcription, and this could depend on the nature and structural features of transcript (such as blockage of polypyrimidine tract which results in delayed recognition) (Vargas et al., 2011). Drexler  et al. showed that as opposed to drosophila transcripts that are shorter, in mammalian cells, splicing of the terminal intron can occur post-transcriptionally (Drexler et al., 2020). Using RNA polymerase II ChIP-Seq time course data from ERα activation in the MCF-7 cells, Honkela et al. showed that large number of genes can show significant delays between the completion of transcription and mRNA production (Honkela et al., 2015). This was attributed to faster transcription of shorter genes which results in splicing  delays suggesting rapid completion of transcription on shorter genes can lead to splicing-associated delays (Honkela et al., 2015). More recently, comparisons of nascent and mature RNA levels suggested a time lapse between transcription and splicing for the genes that are early responders during signaling (Zambrano et al., 2020). The presence of significant numbers of TFF1 nascent RNA in the nucleus in our data corroborates with above observations. 

      • Uniform intensities across many transcripts suggests these are true signal arising from RNA molecules which would not be the case for non-specific, background signal (Author response image 2E).

      • Splicing occurs in the nucleus and intron containing pre-transcripts should be nuclear localized. Thus, intronic signals should remain localized to the nucleus unlike the mature mRNA which translocate to the cytoplasm after processing and thus exonic signals can be found both in the nucleus and the cytoplasm. In keeping with this, we observe no signal in the cytoplasm for the intronic probes and it remains localized within the nucleus as expected and can be seen in Author response image 2F, while exonic signals are observed in both compartments. This suggests to us that the signal is coming from true pre-transcripts. There is no reason for non-specific background labelling to remain restricted to the nucleus.

      • We observe that the mean intronic label counts for both the genes TFF1 and TFF3 increases upon E2-induction compared to uninduced condition (Fig. 2B). Similarly, the mean intronic count for both genes reduce drastically in the TFF1-enhancer deleted cells (Fig. 3C, D). This change in the number of intronic signal specifically on induction and enhancer deletion suggests that the signal is not an artefact and arises from true nascent transcripts that are sensitive to stimulus or enhancer deletion.

      • We expect colocalization of intronic signal with exonic signals in the nucleus, while there can be exonic signals that do not colocalize with intronic, representing more mature mRNA. Indeed, we observe a clear colocalization between the intronic and exonic signals in the nucleus, while exonic signals can occur independent of intronic both in the nucleus and the cytoplasm. This clearly demonstrates that the intronic signals in our experiments are specific and not simply background labelling (Author response image 2G).

      These studies and the arguments above lead us to conclude that the presence of intronic transcripts in the nucleus, away from the site of transcription is not an artefact. We hope the reviewer will agree with us. These analyses have now been included in the manuscript as Supplementary Figure 6 and have been added in the manuscript at line numbers 106-111, 201204,  215-217 and line 231-235. We thank the reviewer for raising this important point.

      Author response image 2.

      Dynamic induction and RNA localization of TFF1 and TFF3 transcription across cell populations using smRNA FISH A. Bar graph depicting the percentage of cells with 1,2,3,4, or greater than 4 sites of transcription for TFF1 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph on right shows the number of cells with zero or non-zero number of alleles firing. B. Bar graph depicting the percentage of cells with 1,2,3,4 or greater than 4 sites of transcription for TFF3 (left) is shown. The graph shows the mean of means from different repeats of the experiment, and error bars denote SEM (n>200, N=3). Only the cells with at least one allele firing were counted and cells with no alleles were not included in this. The graph in the middle shows the number of cells with 2,3,4 or greater than 4 sites of transcription for TFF3.The graph on the right shows the number of cells with zero or non-zero number of alleles firing. C. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in cells induced for 1 hour with E2. The image shows that when a single allele of TFF1 is firing, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. D. Images from single molecule RNA FISH experiment showing transcripts for InTFF1 in uninduced cells. The image shows that when a single allele of TFF1 is firing and transcription is low, the transcripts show a more spatially restricted localisation. The scale bar is 5 microns. E. Line profile through several transcripts in the nucleus show uniform and similar intensities indicating that these are true signals. F. 60X Representative images from a single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1 (top) and InTFF3 and ExTFF3 (bottom). The image shows that there is no intronic signal in the cytoplasm, while exonic signals can be found both in the nucleus and the cytoplasm. The scale bar is 5 microns. G. 60X Representative images from single molecule RNA FISH experiment showing transcripts for InTFF1 and ExTFF1. The image shows that all intronic signals are colocalized with exonic signals, but all exonic signals are expectedly not colocalized with intronic signals, representing more mature mRNA. The scale bar is 5 microns.

      One substantial way to improve the manuscript is to take a careful look at previous single cell analysis of the estrogen response, which in some cases has been done on the exact same genes (PMID: 29476006, 35081348, 30554876, 31930333). In some of these cases, the authors reach different conclusions than those presented in the present manuscript. Likewise, there have been more than a few studies that have characterized these enhancers (the first one I know of is: PMID 18728018). Also, Oh et al. 2021 (cited in the manuscript) did show an interaction between TFF1e and TFF3, which seems to contradict the conclusion from Fig. 3. In summary, the results of this paper are not in dialogue with the field, which is a major shortcoming. 

      We thank the reviewer for pointing out these important studies. The studies from Prof. Larson group are particularly very insightful (Rodriguez et al., 2019). We have now included this in the discussion (line 106-111 and line 420-424) where we suggest the differences and similarities between our, Larson’s group and also Mancini’s group (Patange et al., 2022; Stossi et al., 2020). 

      The 4C-Seq data from the manuscript Oh et al. 2021 is exactly consistent with our observation from Fig 3 as they also observed little to no interaction between TFF1e and TFF3p in WT cells, only upon TFF1p deletion, did the TFF1e become engaged with the TFF3p. In agreement with this, we also observe little to no interaction between TFF1e and TFF3p in WT cells (Fig.3A). This is also consistent with our competition model for resources between these two genes. Oh et al. shows interaction between TFF1e and TFF3 when the TFF1 promoter is deleted showing that when the primary promoter is not available the enhancer is retargeted to the next available gene (Oh et al., 2021). It does not show that in WT or at any time point of E2 signalling does TFF1e and TFF3 interact.

      In the opinion of this reviewer, there are few - if any - experiments to interrogate the existence of LLPS for diffraction-limited spots such as those associated with transcription. This difficulty is a general problem with the field and not specific to the present manuscript. For example, transient binding will also appear as a dynamic 'spot' in the nucleus, independently of any higher-order interactions. As for Fig. 5, I don't think treating cells with 1,6 hexanediol is any longer considered a credible experiment. For example, there are profound effects on chromatin independent of changes in LLPS (PMID: 33536240).  

      We are cognizant of and appreciate the limitations pointed out by the reviewer. We and others have previously shown that ERa forms condensates on TFF1 chromatin region using ImmunoFISH assay (Saravanan et al., 2020).  The data below shows the relative mean ERα intensity on TFF1 FISH spots and random regions clearly showing an appearance of the condensate at the TFF1 site. Further, the deletion of TFF1e causes the reduction in size of this condensate. Thus, we expect that these ERα condensates are characterized by higher-order interactions and become disrupted on treatment with 1,6-hexanediol. These condensates are the size of below micron as mentioned by the reviewer, but most TF condensates are of the similar sizes. We agree with the reviewer that 1,6- hexanediol treatment is a brute-force experiment with several irreversible changes to the chromatin. Although we have tried to use it at a low concentration for a short period of time and it has been used in several papers (Chen et al., 2023; Gamliel et al., 2022). The opposite pattern of TFF1 vs. TFF3 expression upon 1,6- hexanediol treatment suggests that there is specificity. Further, to perturb condensates, mutants of ERa can be used (N-terminus IDR truncations) however, the transcriptional response of these mutants is also altered due to perturbed recruitment of coactivators that recognize Nterminus of ER, restricting the distinction between ERa functions and condensate formation.

      References:

      Chen, L., Zhang, Z., Han, Q., Maity, B. K., Rodrigues, L., Zboril, E., Adhikari, R., Ko, S.-H., Li, X., Yoshida, S. R., Xue, P., Smith, E., Xu, K., Wang, Q., Huang, T. H.-M., Chong, S., & Liu, Z. (2023). Hormone-induced enhancer assembly requires an optimal level of hormone receptor multivalent interactions. Molecular Cell, 83(19), 3438-3456.e12. https://doi.org/10.1016/j.molcel.2023.08.027

      Coté, A., O’Farrell, A., Dardani, I., Dunagin, M., Coté, C., Wan, Y., Bayatpour, S., Drexler, H. L., Alexander, K. A., Chen, F., Wassie, A. T., Patel, R., Pham, K., Boyden, E. S., Berger, S., Phillips-Cremins, J., Churchman, L. S., & Raj, A. (2023). Post-transcriptional splicing can occur in a slow-moving zone around the gene. eLife, 12. https://doi.org/10.7554/eLife.91357.2

      Coulon, A., Ferguson, M. L., de Turris, V., Palangat, M., Chow, C. C., & Larson, D. R. (2014). Kinetic competition during the transcription cycle results in stochastic RNA processing. eLife, 3, e03939. https://doi.org/10.7554/eLife.03939

      Drexler, H. L., Choquet, K., & Churchman, L. S. (2020). Splicing Kinetics and Coordination Revealed by Direct Nascent RNA Sequencing through Nanopores. Molecular Cell, 77(5), 985-998.e8. https://doi.org/10.1016/j.molcel.2019.11.017

      Gamliel, A., Meluzzi, D., Oh, S., Jiang, N., Destici, E., Rosenfeld, M. G., & Nair, S. J. (2022). Long-distance association of topological boundaries through nuclear condensates. Proceedings of the National Academy of Sciences of the United States of America, 119(32), e2206216119. https://doi.org/10.1073/pnas.2206216119

      Honkela, A., Peltonen, J., Topa, H., Charapitsa, I., Matarese, F., Grote, K., Stunnenberg, H. G., Reid, G., Lawrence, N. D., & Rattray, M. (2015). Genome-wide modeling of transcription kinetics reveals patterns of RNA production delays. Proceedings of the National Academy of Sciences of the United States of America, 112(42), 13115. https://doi.org/10.1073/pnas.1420404112

      Oh, S., Shao, J., Mitra, J., Xiong, F., D’Antonio, M., Wang, R., Garcia-Bassets, I., Ma, Q., Zhu, X., Lee, J.-H., Nair, S. J., Yang, F., Ohgi, K., Frazer, K. A., Zhang, Z. D., Li, W., & Rosenfeld, M. G. (2021). Enhancer release and retargeting activates disease-susceptibility genes. Nature, 595(7869), Article 7869. https://doi.org/10.1038/s41586-021-03577-1

      Patange, S., Ball, D. A., Wan, Y., Karpova, T. S., Girvan, M., Levens, D., & Larson, D. R. (2022). MYC amplifies gene expression through global changes in transcription factor dynamics. Cell Reports, 38(4). https://doi.org/10.1016/j.celrep.2021.110292

      Raj, A., van den Bogaard, P., Rifkin, S. A., van Oudenaarden, A., & Tyagi, S. (2008). Imaging individual mRNA molecules using multiple singly labeled probes. Nature Methods, 5(10), Article 10. https://doi.org/10.1038/nmeth.1253

      Rodriguez, J., Ren, G., Day, C. R., Zhao, K., Chow, C. C., & Larson, D. R. (2019). Intrinsic Dynamics of a Human Gene Reveal the Basis of Expression Heterogeneity. Cell, 176(1–2), 213-226.e18. https://doi.org/10.1016/j.cell.2018.11.026

      Saravanan, B., Soota, D., Islam, Z., Majumdar, S., Mann, R., Meel, S., Farooq, U., Walavalkar, K., Gayen, S., Singh, A. K., Hannenhalli, S., & Notani, D. (2020). Ligand dependent gene regulation by transient ERα clustered enhancers. PLOS Genetics, 16(1), e1008516. https://doi.org/10.1371/journal.pgen.1008516

      Stossi, F., Dandekar, R. D., Mancini, M. G., Gu, G., Fuqua, S. A. W., Nardone, A., De Angelis, C., Fu, X., Schiff, R., Bedford, M. T., Xu, W., Johansson, H. E., Stephan, C. C., & Mancini, M. A. (2020). Estrogeninduced transcription at individual alleles is independent of receptor level and active conformation but can be modulated by coactivators activity. Nucleic Acids Research, 48(4), 1800. https://doi.org/10.1093/nar/gkz1172

      Vargas, D. Y., Shah, K., Batish, M., Levandoski, M., Sinha, S., Marras, S. A. E., Schedl, P., & Tyagi, S. (2011). Single-Molecule Imaging of Transcriptionally Coupled and Uncoupled Splicing. Cell, 147(5), 1054–1065. https://doi.org/10.1016/j.cell.2011.10.024

      Waks, Z., Klein, A. M., & Silver, P. A. (2011). Cell-to-cell variability of alternative RNA splicing. Molecular Systems Biology, 7(1), 506. https://doi.org/10.1038/msb.2011.32

      Zambrano, S., Loffreda, A., Carelli, E., Stefanelli, G., Colombo, F., Bertrand, E., Tacchetti, C., Agresti, A., Bianchi, M. E., Molina, N., & Mazza, D. (2020). First Responders Shape a Prompt and Sharp NF-κB-Mediated Transcriptional Response to TNF-α. iScience, 23(9), 101529. https://doi.org/10.1016/j.isci.2020.101529

    1. eLife Assessment

      This important study provides a detailed characterization of individual sarcomeres' contractility and of their synchrony in spontaneously beating cardiomyocytes derived from human induced pluripotent stem cells. The combination of high-resolution tracking, statistical analysis and mesoscopic modeling leads to compelling evidence that sarcomeres operate as dynamically unstable units, leading to stochastic heterogeneities in their contraction-elongation cycles depending on substrate stiffness. The work will be relevant to scientists interested in muscle biophysics, nonlinear dynamics and synchronization phenomena in biological systems.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors present comprehensive experimental observations and a theoretical framework to explain the heterogeneous behaviour of sarcomeres in cardiomyocytes. They show that a stochastic component exists in their contractile activity, which may act as a feedback mechanism regulating physiological function.

      Strengths:

      Experiments and data analysis are robust and valid. The rigorous statistical analysis and unbiased methods enable the authors to draw well-supported conclusions that go beyond the existing literature. Their outcomes inform about cellular activity at the individual level and the authors explain how the transient dynamics of single sarcomeres are governed by a force-velocity relationship and lead to the complex contractile patterns. The similarity of the results to the study cited in [24] demonstrates the validity of the in vitro setup for answering these questions and the feasibility of such in-vitro systems to extend our knowledge of out-of-equilibrium dynamics in cardiac cells.

      Very interesting the suggestion that the interplay between intrinsic fluctuations and the dynamic instability are part of a feedback mechanism for maintaining structural and functional homeostasis.

      The addition of the theoretical model and the new text of the manuscript improves the clarity of the study.

    3. Reviewer #2 (Public review):

      Summary:

      Sarcomeres, the contractile units of skeletal and cardiac muscle, contract in a concerted fashion to power myofibril and thus muscle fiber contraction.

      Muscle fiber contraction depends on the stiffness of the elastic substrate of the cell, yet it is not known how this dependence emerges from the collective dynamics of sarcomeres. Here, the authors analyze contraction time series of individual sarcomeres using live imaging of fluorescently labeled cardiomyocytes cultured on elastic substrates of different stiffness. They find that a reduced collective contractility of muscle fibers on unphysiologically stiff substrates is partially explained by a lack of synchronization in the contraction of individual sarcomeres.

      This lack of synchronization is at least partially stochastic, consistent with the notion of a tug-of-war between sarcomeres on stiff sarcomeres. A particular irregularity of sarcomere contraction cycles is 'popping', the extension of sarcomers beyond their rest length. The statistics of 'popping' suggest that this is a purely random process.

      Strengths:

      This study thus marks an important shift of perspective from whole-cell analysis towards an understanding the collective dynamics of coupled, stochastic sarcomeres.

    4. Reviewer #3 (Public review):

      The manuscript of Haertter and coworkers studied the variation of the length of a single sarcomere and the response of microfibrils made by sarcomeres of cardiomyocytes on soft gel substrates of varying stiffness.

      The measurements at the level of a single sarcomere are an important new result of this manuscript. They are done by combining the labeling of the sarcomeres z line using genetic manipulation and a sophisticated tracking program using machine learning. This single sarcomere analysis shows strong heterogeneities of the sarcomeres that can show fast oscillations not synchronized with the average behavior of the cell and what the authors call popping eveents which are large amplitude oscillations. Another important result is the fact that cardiomyocyte contractility decreases with the substrate stiffness, although the properties of single sarcomeres do not seem to depend on substrate stiffness.

      The authors suggest that the cardiomyocyte cell behavior is dominated by sarcomere heterogeneity. They show that the heterogeneity between sarcomere is stochastic and that the contribution of static heterogeneity (such as composition differences between sarcomeres) is small.

      Strengths:

      All the results are, to my knowledge, new and original. The authors also made a theoretical model where each sarcomere is described by a Langevin equation based on a non-linear coupling between force and velocity of the sarcomeres. This model accounts well for the experimental results including the observation of what the authors call popping events.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study provides a valuable characterization of individual sarcomere's contractility and synchrony in spontaneously beating cardiomyocytes as a function of substrate stiffness. The authors, however, provide an incomplete explanation for the observed heterogeneous and stochastic dynamics, so that the work remains mainly descriptive. The work will be of interest to scientists working on muscle biophysics, nonlinear dynamics, and synchronization phenomena in biological systems.

      We appreciate the reviewer’s insightful comments. A detailed explanation of the described phenomena in the form of a theoretical model and simulations was not included in our manuscript, because we believed it would be most impactful to present a detailed quantitative statistical description of the experiments in one manuscript and then introduce the model, which we already had in preparation, in a separate manuscript to avoid diluting the overall message.

      However, following the reviewers’ advice, we have now included a comprehensive model into the revised manuscript. This model qualitatively and quantitatively explains the experimentally observed phenomena and introduces a novel class of coupled relaxation oscillators based on a non-monotonic force-velocity relationship of individual sarcomeres. We believe that this addition significantly strengthens the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors experimentally demonstrated the heterogeneous behavior of sarcomeres in cardiomyocytes and that a stochastic component exists in their contractile activity, which cancels out at the level of myofibrils.

      Strengths:

      The experiments and data analysis are robust and valid. With very good statistics and unbiased methods, they show cellular activity at the individual level and highlight the heterogeneity between biological networks. The similarity of the results to the study cited in [24] demonstrates the validity of the in vitro setup for answering these questions and the feasibility of such in-vitro systems to extend our knowledge of physiology.

      Weaknesses:

      Compared to the current literature ([24]), the study does not show a high degree of innovation. It mainly confirms what has been established in the past. The authors complemented the published experiments by developing an in vitro setup with stem cells and by changing the stiffness of the substrate to simulate pathological conditions. However, the experiments they performed do not allow them to explain more than the study in [24], and the conclusions of their study are based on interpretation and speculation about the possible mechanism underlying the observations.

      We thank the reviewer for contextualizing our work with the literature. We appreciate the comparison to the study by Kobirumaki-Shimozawa et al. which we cite prominently. They observed stochastically varying beating patterns of individual sarcomeres on a beat-to-beat basis. They propose that this arises from a "titin-based mechanism" operating stochastically, which they interpret as being fundamentally linked to sarcomere-length-dependent effects. This interpretation differs from our model. We feel that the inclusion of our comprehensive model in the revised manuscript will emphasize the significance and novelty of our findings. Our work proposes a distinct alternative mechanistic explanation for the observed stochasticity, grounded in the force-velocity relationship and intrinsic stochasticity, and presents additional novel dynamic phenomena (such as popping and high-frequency oscillations) not reported in the literature yet. We outline the key advancements of our study below:

      (1) Physiologically Relevant Human Model System: Our study utilizes human induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs). Using a human cell model provides direct relevance for understanding human cardiac physiology and pathophysiology, overcoming limitations inherent in translating findings from rodent models. The hiPSC-CMs exhibit key physiological differences from the mouse ventricular myocytes observed in [24], most notably beating at a significantly lower frequency (~1 Hz or 60 bpm) compared to mice (~5-8 Hz or 300-500 bpm). This difference in timescale is critical as it allowed us to resolve complex intra-beat dynamics that may be different and also harder to observe in mouse cardiomyocytes.

      (2) Advanced Experimental Methodology and Resolution: We developed a novel assay incorporating our SarcAsM algorithm for high-throughput tracking and analysis of individual sarcomere dynamics. This approach gave us spatial resolution better than 20 nm at significantly higher sampling rates than previous studies, including Kobirumaki-Shimozawa et al. Furthermore, our high-throughput in vitro approach made it possible to analyze vastly larger datasets than, e.g., the study by Kobirumaki-Shimozawa et al. (which reports observations from fewer than 20 myofibrils, encompassing less than 200 sarcomeres in total). While we recognize that in-vivo tissue studies present unique experimental challenges, the substantially greater statistical power of our study is crucial for reliably characterizing the complex, stochastic dynamics we report. The enhanced resolution and statistical robustness are not merely incremental; they enable the detailed identification and analysis of heterogeneous behaviors that were previously inaccessible or could not be characterized with the same level of confidence.

      (3) Novel Observed Phenomena: Our high-resolution data reveals specific dynamic behaviors, such as sarcomere "popping" and high-frequency oscillations during contraction, which, to our knowledge, have not been previously reported or characterized in cardiomyocytes. The resolution limitations and the high beating frequency in mouse models may not have permitted the observation of these subtle, but potentially important phenomena.

      (4) Distinct Mechanistic Explanation and Model: Kobirumaki-Shimozawa et al. propose a qualitative model where sarcomere motion variability primarily arises from length-dependent activation. This view is essentially a static one, based on a long history of isometric skeletal muscle experiments, where time-dependent forces are not relevant. We argue that in highly dynamic cardiomyocytes this may not be the most useful approach. While we acknowledge length dependence can play a role, our integrated experimental-theoretical work proposes a different primary mechanism. Our model demonstrates that the observed stochastic heterogeneity and beat-to-beat variations, including the oscillatory motion and popping, can be quantitatively explained by dynamic instabilities arising from a non-monotonic force-velocity relationship of individual sarcomeres in conjunction with intrinsic sarcomere-level stochastic fluctuations. The model emphasizes the active, transient nature of force generation rather than solely assuming length dependence. Our model provides an alternative explanation for the observed dynamics, and a quantitative, mechanism-based understanding.

      Reviewer #2 (Public Review):

      Summary:

      Sarcomeres, the contractile units of skeletal and cardiac muscle, contract in a concerted fashion to power myofibril and thus muscle fiber contraction.

      Muscle fiber contraction depends on the stiffness of the elastic substrate of the cell, yet it is not known how this dependence emerges from the collective dynamics of sarcomeres. Here, the authors analyze the contraction time series of individual sarcomeres using live imaging of fluorescently labeled cardiomyocytes cultured on elastic substrates of different stiffness. They find that reduced collective contractility of muscle fibers on unphysiologically stiff substrates is partially explained by a lack of synchronization in the contraction of individual sarcomeres.

      This lack of synchronization is at least partially stochastic, consistent with the notion of a tug-of-war between sarcomeres on stiff sarcomeres. A particular irregularity of sarcomere contraction cycles is 'popping', the extension of sarcomeres beyond their rest length. The statistics of 'popping' suggest that this is a purely random process.

      Strengths:

      This study thus marks an important shift of perspective from whole-cell analysis towards an understanding of the collective dynamics of coupled, stochastic sarcomeres.

      Weaknesses:

      Further insight into mechanisms could be provided by additional analyses and/or comparisons to mathematical models.

      We thank the reviewer for the feedback. We have enhanced the manuscript by a comprehensive dynamic model, that we also contrast with previously proposed models.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript of Haertter and coworkers studied the variation of length of a single sarcomere and the response of microfibrils made by sarcomeres of cardiomyocytes on soft gel substrates of varying stiffnesses.

      The measurements at the level of a single sarcomere are an important new result of this manuscript. They are done by combining the labeling of the sarcomeres z line using genetic manipulation and a sophisticated tracking program using machine learning. This single sarcomere analysis shows strong heterogeneities of the sarcomeres that can show fast oscillations not synchronized with the average behavior of the cell and what the authors call popping events which are large amplitude oscillations. Another important result is the fact that cardiomyocyte contractility decreases with the substrate stiffness although the properties of single sarcomeres do not seem to depend on substrate stiffness.

      The authors suggest that the cardiomyocyte cell behavior is dominated by sarcomere heterogeneity. They show that the heterogeneity between sarcomeres is stochastic and that the contribution of static heterogeneity (such as composition differences between sarcomeres) is small.

      Strengths:

      All the results are to my knowledge new and original and deserve attention.

      Weaknesses:

      However, I find the manuscript a bit frustrating because the authors only give very qualitative explanations of the phenomena that they observe. They mention that popping could be explained by a nonlinear force-velocity relation of the sarcomere leading to a rapid detachment of all motors. However, they do not explicitly provide a theoretical description. How would the popping depend on the parameters and in particular on the substrate stiffness? Would the popping statistics be affected by the stiffness? It is also not clear to me how the dependence on the soft gel stiffness of the cardiomyocyte cell can be explained by the stochasticity of the sarcomere properties. Can any of the results found by the authors be explained by existing theories of cardiomyocytes? The only one I know is that of Safran and coworkers.

      I also found the paper very difficult to read. The authors should perhaps reorganize the structure of the presentation in order to highlight what the new and important results are.

      We are grateful for this detailed and critical feedback. The observed phenomena (stochastic heterogeneity, popping, high-frequency oscillatory motion) can indeed be explained by a nonmonotonic force-velocity relation along with stochastic fluctuations of individual sarcomeres. At the time of initial submission of this manuscript, we already had a theoretical model in preparation, which both qualitatively and quantitatively explains the observed phenomena. As a result, we included certain interpretations preemptively, which caused some lack of clarity in the absence of the full model. We have now added the model to this manuscript, providing a mechanistic interpretation of our findings. The model is different from prior models in that it emphasizes time-dependent forces, typically disregarded in models built to understand isometric skeletal muscle experiments.

      We have shortened, streamlined and restructured our manuscript to improve the readability and accessibility of our study.

      Recommendations for the authors:

      There is a consensus among reviewers that the link between the stiffness dependence of the observed stochastic dynamics and the proposed tug-of-war mechanism is unclear. More quantitative support and discussion is required, possibly using theoretical modeling.

      We are grateful for the insightful and comprehensive feedback by both editor and reviewers. As suggested, we have now added a comprehensive model explaining the observed phenomena and presenting a new conceptual view on cardiac muscle dynamics.

      Reviewer #1 (Recommendations For The Authors):

      The authors addressed an interesting question related to the dynamics of cardiac cells and their multiscale dynamics. They did a good job in terms of experimental design and data analysis. However, I fear that they do not contribute enough new information to the topic.

      The authors should refer to the study in [24] and explain better the difference between these two studies. Although the different approaches are quite obvious, it is not clear to me what additional insights they add to the problem. They conducted their experiments with different stiffnesses. However, the conclusions they draw from the study are based on speculation (e.g. about the behavior of myosin heads in relation to shortening and relaxation), while their data mainly confirm previous studies. They need to address more explicitly the novelty of their study.

      Novelty and Comparison with Previous Studies: We understand the concern about distinguishing our contribution from prior work, specifically Kobirumaki-Shimozawa et al., 2021.

      As detailed in our public response, these are the key advances:

      Use of a medically relevant human iPSC-CM model vs. mouse cardiomyocytes.

      Superior spatial and temporal resolution via our SarcAsM algorithm, revealing novel phenomena like popping and high-frequency oscillations not previously reported.

      Significantly greater statistical power due to our high-throughput in vitro assay.

      We added a distinct mechanistic explanation based on the dynamic force-velocity relationship and sarcomere-level stochasticity, contrasting with the static, deterministic titin/length-dependence focus of previous studies.

      Interpretation and Speculation: We acknowledge that without the explicit model, some interpretations in the initial submission appeared speculative. As noted in our public response, we had already started to develop a theoretical model explaining our observations at the time of submission, targeting a second follow-up publication. Including interpretations based on this unpublished model prematurely clearly caused confusion. We now include the full model in the revised manuscript.

      Integration of the Theoretical Model: We have now fully integrated the model into the revised manuscript. The model explicitly demonstrates how the non-monotonic force-velocity relationship of individual sarcomeres leads to dynamic instabilities around a critical force threshold. This instability along with stochasticity drives a 'tug-of-war' between coupled sarcomeres, generating complex emergent behaviors.

      Mechanistic Explanation Beyond Length-Dependence: Our model quantitatively reproduces all key experimental findings (stochastic heterogeneity, popping, oscillations) without relying on length-dependent activation effects. This strongly supports our conclusion that the active, transient dynamics of individual sarcomeres governed by the force-velocity relationship are fundamental drivers of these complex contractile patterns. We believe this provides a significant conceptual advance, highlighting a potentially underappreciated aspect of sarcomere dynamics. Previous models focused mostly on length-dependence, historically based on skeletal muscle fiber experiments that were often done under static, isometric conditions. We feel that the new model represents a substantial paradigm shift in understanding highly dynamic muscles such as heart muscle.

      We are confident that the inclusion of the model addresses the majority of the reviewer's concerns.

      Additional comments:

      The authors write of a tug-of-war competition between the sarcomeres, and I'm not sure what they mean by that. I would spend more words explaining this point, especially because it seems to be an important point to describe their results. Similarly, they talked about an all-or-nothing phenomenon when they described the elongation of sarcomeres. What do they mean by this?

      We have revised the manuscript where clarification was needed and now define the terms mentioned more explicitly.

      (1) "Tug-of-War": We used this term metaphorically to describe the mechanical competition between linearly coupled sarcomeres within a myofibril, especially when contracting against rigid external boundary conditions. While it is not a perfect analogy, the metaphor intuitively captures the inherent instability of this interaction: similar to how a team in a real tug-of-war might suddenly yield when one person tires and the rest of team gets overloaded, rather than steadily losing ground, the dynamic instability arising from the non-monotonic force-velocity relationship (detailed in our model, lines 300ff) can cause individual sarcomeres to abruptly change state (e.g., shorten or rapidly lengthen) while under tension from their neighbors. We have removed the term from the title and now use it more sparingly within the manuscript to better reflect its role as an illustrative analogy.

      (2) "All-or-Nothing" Elongation (Popping): The term "popping" describes our experimental observation of sudden, rapid, and extensive elongation of individual sarcomeres. This typically occurs late in the contraction cycle during early relaxation, when overall force may be declining, but individual sarcomeres can still experience significant tension from their neighbors. We described this specific type of rapid elongation in the original manuscript as an "all-or-nothing" phenomenon because, typically, sarcomeres in these events yield rapidly and strongly overshoot their resting length without recovering in a given activation cycle. The speed of popping events is substantially higher than the speed of coordinated gradual shortening observed during systoles that is driven by bound myosin heads. This observation strongly suggests an instability-driven, avalanche-like unbinding of myosin heads from the actin filaments during these events.

      We agree that the term "all-or-nothing" is not precise, and we have removed it, as it is not essential for describing the observed "popping" dynamics.

      The authors claim that the popping frequency increases as a function of stiffness. However, Figure 4E does not really seem to be a common practice in terms of statistical significance. A better description could help to remove this doubt.

      We clarified the presentation of popping frequency data and its statistical interpretation.

      (1) Popping Frequency vs. Substrate Stiffness (previously Figure 4D, now Figure 3G):

      We first corrected that the dependence of popping frequency on substrate stiffness was presented in Figure 4D, not 4E. In the revised, shortened manuscript it can be now found in Fig. 3G. Due to the large number of observations (N) in our dataset, the slight upward trend in popping frequency with increasing substrate stiffness shown in Figure 4D does reach statistical significance using standard tests. For details see Figure captions.

      (2) Popping Frequency vs. Sarcomere Resting Length (previously Figure 4E, now Figure 3H):

      Figure 4E addresses the relationship between popping frequency and the individual sarcomere's resting length. To generate this plot, we binned sarcomeres based on their measured resting length (in intervals of 0.02 µm) and calculated the mean popping frequency within each bin across all conditions. We have now clarified this in the figure caption.

      (3) Interpretation of Length Dependence:

      While Figure 3H clearly shows that longer sarcomeres are more prone to popping, we argue this is likely a modulating factor rather than the sole underlying cause. Two key observations support this interpretation:

      Even very short sarcomeres (e.g., < 1.65 µm resting length) exhibit a non-zero popping frequency (around 5-10%), indicating that popping is not exclusive to long sarcomeres.

      The distribution of resting lengths, now added to the graph, is narrower than the wide range (1.6-2.0 µm) plotted in Figure 3H. Popping still occurs stochastically within a myofibril of sarcomere with relatively similar resting lengths.

      Therefore, while length clearly influences the probability of popping, the phenomenon itself appears to be fundamentally stochastic, occurring across a range of lengths. This is consistent with our model in which dynamic instabilities (driven by the non-linear force-velocity relationship) and stochastic fluctuations are the primary triggers, while length affects probability of occurrence.

      Changes in Manuscript:

      We have revised the text associated with Figures 3G and 3H to clarify the distinction between stiffness and length dependence.

      We have added a statement in the Methods section and figure legends (e.g., Legend for Fig 3) explaining our approach to statistical analysis and interpretation for large datasets where standard p-values may be less informative.

      We believe these clarifications directly address the reviewer's concerns about the data presentation and interpretation in Figure 3.

      Reviewer #2 (Recommendations For The Authors):

      This is an interesting study, which however could and should be extended, see below. The current manuscript contains much less information than its length suggests; its figures contain partially redundant data.

      Taking into account this critical feedback, we have restructured, streamlined and shortened the manuscript to improve readability and accessibility.

      (1) How regular are the cellular contraction cycles?

      Have the authors computed a coefficient of variation of cycle durations?

      Does this regularity depend on substrate stiffness?

      We have substantially improved the detection accuracy of contraction intervals compared to our initial submission (details see SarcAsM, https://www.biorxiv.org/content/10.1101/2025.04.29.650605v1). We calculated the beating rate variability (defined as the standard deviation of cycle durations), and found a low variability of on average less than 0.05 s across the tested conditions. The distribution of this variability is positively skewed, with the majority of values clustering near zero. We have added new panels showing these results to Fig. S2B.

      (2) Which experiments could the authors perform to identify the origin of the apparent 3-Hz oscillations?

      Would these oscillations persist even if the cardiomyocytes would not beat?

      We now address these questions in the revised manuscript.

      (1) Active Nature: The ~3 Hz oscillations are clearly linked to active contraction. They are absent in quiescent, non-beating cardiomyocytes observed under identical conditions, confirming that they are not passive fluctuations or baseline cellular tremors.

      (2) Signal Fidelity: We are confident these are genuine physiological events, not artifacts. Our high temporal resolution (~15 ms frame time) and tracking accuracy (< 20 nm) allow reliable detection because events are well above system noise. This is now explained in the revised manuscript.

      (3) Can the authors augment their study by modeling?

      For example, could the experimental data be fitted by a Kuramoto-type model of the form d phi_i / dt = eps*sin( Omega - phi_i ) + lambda*sin( phi_i - phi_i+1 ) + xi_i, combining phase-locking of sarcomere oscillations with phase phi_i to intracellular calcium oscillations with phase Omega, and anti-phase synchronization between neighboring sarcomeres, as well as noise xi?

      If yes, how would the coupling strength depend on subtrate stiffness?

      We now added a model. While a Kuramoto-type phase model is powerful for studying synchronization, we determined that a more mechanistic approach was required. Crucially, sarcomeres are mechanically coupled in series within a myofibril, and this direct physical linkage is not well-represented by the abstract, phase-based coupling of a Kuramoto model.

      Instead, our model comprises serially coupled sarcomeres, each governed by an underdamped Langevin equation. This framework allowed us to infer the force-velocity relation without any prior assumptions directly from our experimental data, revealing a critical non-monotonic characteristic. As we now emphasize in the revised manuscript, this behavior is mathematically equivalent to a Van-der-Pol relaxation oscillator, which reflects the instability-driven nature of the system.

      Furthermore, and in line with the reviewer's suggestion, our model incorporates a stochastic noise term which we found essential for reproducing the observed phenomena. Without this noise term, the characteristic sarcomere dynamics do not emerge (Fig. 5).

      (4) What is the maximally extended length of titin, and how does this length correspond to the maximal length of popping sarcomeres?

      The force-extension curves of titin have been measured in single-molecule experiments (and the packing density of titin is known) - can the authors use this information to infer the forces acting inside sarcomeres?

      We thank the reviewer for this thoughtful question. While sarcomere length during popping can be measured, inferring the corresponding intra-sarcomeric force is not straightforward in a living, contracting cardiomyocyte. The relationship between extension and force is complex and dynamic, involving multiple molecular components.

      Our data show elongations up to 0.5 μm during popping events. While this magnitude is plausibly within the extensibility range of titin and other mechanically relevant components (Caporizzo & Prosser, 2021; Loescher & Linke, 2023), directly inferring force from this observation is challenging. In such a multi-component system with both active and passive elements, total force comprises several factors that cannot be disentangled from a simple length measurement alone. First, the system is dominated by active, velocity-dependent force generation of cross-bridges, which our model shows is non-monotonic. Second, titin exhibits a restoring force that is strongly strain-rate dependent (Rief et al., 1997), critical during rapid elongation. Third, viscous drag forces within the sarcomere are also highly strain-rate dependent, contributing significantly during rapid length changes. Fourth, other structural elements such as microtubules and intermediate filaments contribute to viscoelastic properties, particularly at high strains (Caporizzo & Prosser, 2021). This complex interplay makes it impossible to map a given sarcomere length to a unique force value using single-molecule titin data alone.

      (5) I urge the authors to make their raw data openly available.

      We agree on the importance of data availability. While the complete raw imaging dataset is several hundred gigabytes and thus impractical to deposit, we have uploaded a comprehensive dataset to Zenodo to ensure full reproducibility. This repository includes a representative subset of raw imaging data (50 cells per condition), with corresponding sarcomere motion data provided in a readable JSON format. Crucially, the deposition also contains the complete aggregated data underlying all figures and statistical analyses presented in the manuscript. All provided data can be programmatically accessed and analyzed using our `SarcAsM` Python API. The data can be accessed at: https://doi.org/10.5281/zenodo.17564384.

      Minor

      (1) How did the authors determine the start and end of contraction cycles when analyzing their data?

      The start and end points of each contraction cycle were identified using ContractionNet, a custom convolutional neural network we developed for this purpose. This method, used for all analyses in the revised manuscript, detects contraction intervals with high accuracy directly from sarcomere dynamics time-series data and significantly outperforms the threshold-based approach used previously. The complete methodology, algorithm description, and validation of ContractionNet are detailed in our companion paper on the SarcAsM analysis software

      (www.biorxiv.org/content/10.1101/2025.04.29.650605v1, see Fig. S6).

      (2) What are the measurement errors in determining Delta_SL?

      The measurement error for the Z-band trajectories is approximately 17 nm. This high tracking accuracy is achieved with our deep-learning-based Z-band segmentation approach, which employs a 3D convolutional neural network (3D U-Net) to leverage both spatial and temporal context for robust Z-band segmentation in noisy, high-speed recordings. A full description of this validation is available in our SarcAsM companion paper (see Figure S3 therein).

      (3) Does popping occur while other sarcomeres are still contracting?

      This is an important point. Yes, popping frequently occurs while other sarcomeres within the same myofibril are still actively shortening. This simultaneity is clearly visualized in the newly added Movie M1, which displays a phase-space plot (velocity vs. length change relative to rest) for all tracked sarcomeres over time. In this visualization, popping events appear as trajectories moving into the top-right quadrant (rapid elongation), while concurrently, other sarcomeres are represented by points in the left quadrants (negative velocity), indicating ongoing shortening. We have included Movie M1 as supplementary material.

      (4) The authors argue that their data on popping sarcomeres is consistent with homogeneous popping probabilities.

      (5) Can the authors assess in simulations how dispersed the popping probabilities of individual sarcomeres could be before they would notice a statistically significant difference to the homogeneous case?

      This question touches on a key challenge in analyzing these complex dynamics. A direct statistical test of popping probability for each individual sarcomere is not feasible, as the number of events per sarcomere over our observation time is too low for robust single-unit analysis. Consequently, our approach relies on testing the cumulative distributions of inter-event spatial distances and temporal gaps across all sarcomeres within a given region (LOI).

      In nearly half of the analyzed LOIs, these cumulative distributions were statistically indistinguishable (p > 0.05) from the geometric distribution expected for a single, homogeneous stochastic process. This provides strong support for our primary conclusion that popping is fundamentally a random phenomenon.

      For the cases that deviate from the homogeneous model, we argue that this does not refute the underlying stochasticity of the events. Instead, we propose this is the expected statistical signature of pooling data from a population of sarcomeres that have slight, intrinsic variations in their individual popping probabilities due to factors like resting length or structural integrity. Even if each sarcomere's popping is a locally random event, a cumulative test performed on a population with varied baseline probabilities is expected to detect a deviation from a simple, homogeneous model.

      Regarding the requested simulation study: While we agree this would be methodologically informative, the sensitivity to detect probability dispersion depends on multiple interacting factors (number of sarcomeres per LOI, observation time, event rates, and the assumed form of heterogeneity). Any single simulation scenario would therefore be highly model-dependent and of limited generality. Rather than introducing additional assumptions, we base our conclusions on the observed agreement with the homogeneous model in approximately half of LOIs and the correlation of deviations with measurable properties (Fig. 4E). A comprehensive statistical analysis would constitute a substantial methodological study beyond the scope of this mechanistically focused manuscript.

      (6) Can the authors measure sarcomere rest length and check if this rest length is correlated with the popping probability of individual sarcomeres?

      Yes, we performed this analysis. As shown in Figure 3H (previously Fig. 4E), we found a positive correlation between sarcomere resting length and popping frequency, confirming that longer sarcomeres have a higher probability of popping.

      Importantly, however, the popping probability remains non-zero even for shorter sarcomeres. As detailed in our response to Reviewer #1 regarding this figure, we interpret resting length as a significant modulating factor that influences popping probability, rather than the sole determinant of the phenomenon.

      (7) Several mathematical models of sarcomere contraction exist (e.g., crossbridge models).

      (8) Could the authors perform computer simulations of several such stochastic sarcomere models coupled in series?

      Alternatively, could the authors discuss this?

      As I understand, references 16-18 model myofibril contraction assuming static variability of sarcomeres, but do not account for stochasticity in the contractility of individual sarcomeres.

      We thank the reviewer for this excellent suggestion. We have performed such simulations, and the theoretical model is a central component of our revised manuscript (new Figures 4 and 5; manuscript lines 316ff).

      As the reviewer points out, previous models (e.g., refs 12 and 14 in our manuscript) have often relied on predefined static variability between sarcomeres to explain heterogeneous behavior. Our work takes a fundamentally different approach. We model the myofibril as a chain of serially coupled sarcomeres, where the dynamics of each unit are governed by an underdamped Langevin equation. This formulation inherently incorporates stochasticity and describes the interplay between a non-monotonic, velocity-dependent active force, a length-dependent passive force, and the mechanical coupling to its neighbors.

      Crucially, the model parameters were not assumed, but were instead inferred by fitting the model directly to our experimental data using a gradient-free optimization algorithm. This data-driven stochastic model was sufficient to quantitatively reproduce key observed phenomena, including high-frequency oscillations and popping events. Our central finding is that these complex behaviors emerge naturally from the coupled system, driven by the non-monotonic force-velocity relationship and intrinsic stochastic fluctuations. This demonstrates that predefined static heterogeneity is not required to explain the observed dynamics.

      (9) The manuscript could be shortened (e.g., lines 52-56 in the introduction provide little extra value).

      We have significantly revised the entire manuscript to improve clarity and readability. We have removed sentences in the introduction as suggested and substantially restructured major sections. One of the main reasons for this was the integration of our theoretical model, which was originally prepared as a separate manuscript. This required us to completely reframe the introduction and reorganize the figures and results.

      We are confident that these extensive changes have resulted in a stronger, more concise and impactful paper that now integrates our experimental findings with a theoretical model.

      (10) Figure 2 is overloaded with data. Several panels could be moved to the SM without compromising the key message.

      Introducing the notation in panels Figures 2A-C does not seem ideal to me; maybe add a cartoon?

      We agree that the Fig. 2 was dense. We have redesigned panels A-F to improve clarity and better guide the reader. We now use a consistent color-coding scheme to link the extrema in the phase portraits (A-C) to the corresponding distributions of individual sarcomeres (E-G). We have also revised the accompanying text to make the figure's logic more transparent.

      We have considered moving panels A-C to the supplementary materials. However, we believe their placement in the main text is crucial for two reasons:

      (1) Revealing Core Dynamics: The length-velocity phase portrait is the first visualization that reveals the underlying near-oscillatory dynamics of individual sarcomeres. This was not an assumed behavior but a critical experimental observation that directly motivated our entire theoretical modeling effort. We now also provide animated versions of these plots (Movies X-Y) to further illustrate these complex dynamics.

      (2) Enabling Model-Experiment Comparison: A phase portrait is a standard tool for comparing experimental data with theoretical models. Retaining it in the main text allows us to directly compare data and model in our new Figures 4 and 5, providing a clear validation of our model.

      (11) Similarly, Figures 4F, G, and H seem dispensable to me.

      (I also wonder how clear the analogy of a coin flip is if a biased coin with probabilities p and 1-p needs to be used.)

      We agree that the previous Figure 4F, which served a purely illustrative purpose, was dispensable and have removed it. The "coin flip" analogy was potentially confusing and we have removed it.

      As part of a broader restructuring of the manuscript, the quantitative analyses from the original Figures 4G and 4H are now presented as Figures 3I and 3J. They provide important supporting evidence for the stochastic nature of the resulting popping events. We believe retaining this quantitative analysis is valuable, and we hope that by streamlining the figure and removing the analogy, we have addressed the reviewer's concerns.

      (12) Equation (1) is unnecessarily complicated. The same holds for Equation (2).

      It might make sense to separate definitions for serial and mutual correlations.

      (This would also simplify the axes labels in Figure 3C.)

      (13) The notation used in Equation (1) is not fully clear.

      I assume t denotes a unit-less time index and T is the unit-less duration of a contraction cycle, measured in multiples of a fixed time interval?

      Regarding comments (12) and (13):

      We thank the reviewer for these helpful suggestions. In response to comment (12), we have separated the definitions for the mutual (r<sub>m</sub>) and serial (r<sub>s</sub>) correlation coefficients, presenting them as distinct calculations rather than as special cases of a single, more complex formula. This makes their definitions more direct and explicit. The calculation for the serial correlation coefficient has also been streamlined into a concise inline definition.

      In response to comment (13), we have clarified the notation in Equation (1). In the manuscript text (lines 208f), we now explicitly state that 𝑡 represents the discrete, unitless time index (i.e., the frame number) within a time-series, and 𝑇 is the total number of frames (i.e., the total duration in frames) of a given contraction cycle.

      While Equation (1) itself is the standard definition for the uncentered correlation coefficient and cannot be algebraically simplified, we have added text to specify this and justify its use. This metric (equivalent to cosine similarity) is appropriate for our analysis as it assesses the similarity in the shape of motion patterns, independent of their mean values.

      Finally, to further streamline the paper, we have removed the velocity correlation analysis and the corresponding parts of Figure 3.

      (14) The authors should make clear in all figures what is experiment and what is simulation.

      We have now clarified the nature of each graph in the figure captions.

      (15) The caption of Figure 3C could be simplified.

      We have simplified all figure captions.

      (16) I found Figure 3A hard to understand.

      We concluded that Figure 3A was confusing and did not add essential information to the manuscript. We have removed it entirely.

      Reviewer #3 (Recommendations For The Authors):

      In conclusion, l think that the manuscript would gain a lot if some more precise and more quantitative interpretation of the results were given. This might require a collaboration with theorists.

      We have integrated a novel theoretical framework into the revised manuscript (new Figures 4 and 5; manuscript lines 300ff as described above.

      This new section introduces a data-driven, stochastic dynamical model that simulates the myofibril as a chain of serially coupled sarcomeres. Each sarcomere's motion is governed by an underdamped Langevin equation, a formulation that inherently accounts for stochasticity. Crucially, our model incorporates a non-monotonic force-velocity relationship inferred directly from our experimental data, rather than relying on predefined static variability between sarcomeres a key distinction from previous theoretical work.

      This integrated model successfully and quantitatively reproduces all major experimental phenomena described in the paper, including high-frequency oscillations and stochastic "popping" events. It demonstrates that these complex behaviors emerge naturally as dynamic instabilities from the coupled system. This addition elevates the manuscript from a descriptive study to one that provides a predictive, mechanism-driven framework for understanding sarcomere dynamics.

    1. eLife Assessment

      This is a theoretical analysis that gives compelling evidence that length control of bundles of actin filaments undergoing assembly and disassembly emerges even in the absence of a length control mechanism at the individual filament level. Furthermore, the length distribution should exhibit a variance that grows quadratically with the average bundle length. The experimental data are compatible with these fundamental theoretical findings, but further investigations are necessary to make the work conclusive concerning the validity of the inferences for filamentous actin structures in cells.

    2. Reviewer #1 (Public review):

      Actin filaments and their kinetics have been the subject of extensive research, with several models for filament length control already existing in the literature. The work by Rosario et al. focuses instead on bundle length dynamics and how their fluctuations can inform us on the underlying kinetics. Surprisingly, the authors show that irrespective of the details, typical "balance point" models for filament kinetics give the wrong scaling of bundle length variance with mean length compared to experiments. Instead, the authors show that if one considers a bundle made of several individual filaments, length control for the bundle naturally emerges even in the absence of such a mechanism at the individual filament level. Furthermore, the authors show that the fluctuations of the bundle length display the same scaling with respect to the average as experimental measurements from different systems. This work constitutes a simple yet nuanced and powerful theoretical result that challenges our current understanding of actin filament kinetics and helps relate accessible experimental measurements such as actin bundle length fluctuations to their underlying kinetics. Finally, I found the manuscript to be very well written, with a particularly clear structure and development, which made it very accessible.

      Comments on revisions:

      I maintain my original favorable assessment of this manuscript.

      I thank the authors for considering my comments and for their thoughtful replies. It would have been helpful to see some of the comments reflected in the text and discussion. I leave this to the authors.

      I appreciate that the authors replaced the figures with higher-resolution versions, but I maintain my assessment that the graphical and aesthetic quality of the figures, especially the size of the legends (which are often tiny and difficult to read), labels, colors, etc., could be improved. Again, I leave this to the authors.

    3. Reviewer #2 (Public review):

      The authors present a theoretical study of the length dynamics of bundles of actin filaments. They first show that a "balance point model" in which the bundle is described as an effective polymer. The corresponding assembly and disassembly rates can depend on bundle length. This model generates a steady-state bundle-length distribution with a variance that is proportional to the average bundle length. Numerical simulations confirm this analytic result. The authors then present an analysis of previously published length distributions of actin bundles in various contexts and argue that these distributions have variances that depend quadratically with the average length. They then consider a bundle of N independent filaments that each grow in an unregulated way. Defining the bundle length to be that of the longest filament, the resulting length distribution has a variance that does scale quadratically with the average bundle length.

      The manuscript is very well written, and the computations are nicely presented. The work gives fundamental insights into the length distribution of filamentous actin structures. The universal dependence of the variance on the mean length is of particular interest. It will be interesting to see in the future how many universality classes there are, and which features of a growth process determine to which class it belongs.

      Comments on revisions:

      I thank the authors for their detailed and thorough answers to the points that had been raised. I have no further recommendations.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This is a theoretical analysis that gives compelling evidence that length control of bundles of actin filaments undergoing assembly and disassembly emerges even in the absence of a length control mechanism at the individual filament level. Furthermore, the length distribution should exhibit a variance that grows quadratically with the average bundle length. The experimental data are compatible with these fundamental theoretical findings, but further investigations are necessary to make the work conclusive concerning the validity of the inferences for filamentous actin structures in cells.

      We think this is an excellent assessment of the article. We suggest adding a sentence after the first one: “The distribution of bundle lengths is not Gaussian but Gumbel, since the bundle length is the length of the longest filament in the bundle.”

      Public Reviews:

      Reviewer #1 (Public Review):

      Actin filaments and their kinetics have been the subject of extensive research, with several models for filament length control already existing in the literature. The work by Rosario et al. focuses instead on bundle length dynamics and how their fluctuations can inform us of the underlying kinetics. Surprisingly, the authors show that irrespective of the details, typical "balance point" models for filament kinetics give the wrong scaling of bundle length variance with mean length compared to experiments. Instead, the authors show that if one considers a bundle made of several individual filaments, length control for the bundle naturally emerges even in the absence of such a mechanism at the individual filament level. Furthermore, the authors show that the fluctuations of the bundle length display the same scaling with respect to the average as experimental measurements from different systems. This work constitutes a simple yet nuanced and powerful theoretical result that challenges our current understanding of actin filament kinetics and helps relate accessible experimental measurements such as actin bundle length fluctuations to their underlying kinetics. Finally, I found the manuscript to be very well written, with a particularly clear structure and development which made it very accessible.

      We are grateful to Reviewer #1 for this very favorable assessment.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a theoretical study of the length dynamics of bundles of actin filaments. They first show a "balance point model" in which the bundle is described as an effective polymer. The corresponding assembly and disassembly rates can depend on bundle length. This model generates a steady-state bundle-length distribution with a variance that is proportional to the average bundle length. Numerical simulations confirm this analytic result. The authors then present an analysis of previously published length distributions of actin bundles in various contexts and argue that these distributions have variances that depend quadratically with the average length. They then consider a bundle of N-independent filaments that each grow in an unregulated way. Defining the bundle length to be that of the longest filament, the resulting length distribution has a variance that scales quadratically with the average bundle length.

      Strengths:

      The manuscript is very well written, and the computations are nicely presented. The work gives fundamental insights into the length distribution of filamentous actin structures. The universal dependence of the variance on the mean length is of particular interest. It will be interesting to see in the future, how many universality classes there are, and which features of a growth process determine to which class it belongs.

      Weaknesses:

      (1) You present the data in Fig. 3 as arguments against the balance point model. Although I agree that the data is compatible with your description of a bundle of filaments, I think that the range of mean lengths you can explore is too limited to conclusively argue against the balance point model. In most cases, your data extend over half an order of magnitude only. Could you provide a measure to quantify how much your model of independent filaments fits better than the balance point model?

      Indeed, we agree that the experimental data we present, each on their own, provide inconclusive evidence of the scaling predicted by our model. However, in aggregate, as presented in Fig. 3E, the data make for compelling evidence of scaling of the variance with the average length squared, as quantified by the power-law fit. Also, we think that Fig. 3E argues strongly against the Balance Point Model, because the data do not conform with simple linear scaling (indicated by the dashed line in Fig. 3E). Regardless, we agree with the referee that better data is needed to make a more convincing case, and we see this paper as a call to arms to collect such data in the future. The published data we used (other than our own data from experiments on yeast actin cables) is from experiments that were not designed with this question in mind, i.e., how do length fluctuations scale with the mean?

      (2) Concerning your bundled-filament model, why do you consider the polymerizing ends to be all aligned? Similarly to the opposite end, fluctuations should be present. Furthermore, it is not clear to me, where the presence of crosslinking proteins enters your description. Finally, linked to my first remark on this model, why is the longest filament determining the length of the bundle in all the biological examples you cite? I am thinking in particular about the actin cables in yeast.

      In the case of the yeast actin cables (which grow from the bud neck into the mother cell), we know that the formins that polymerize the actin filaments are spatially aligned at the bud neck. In the cases of stereocilia and microvilli, again the polymerizing ends of the actin filaments are well-aligned at the growing tips of these bundled actin structures, as indicated by classic EM studies from Lew Tilney and others. The alignment of polymerizing actin filament ends is more difficult to assess at the leading edge of lamellipodia, because of undulating shape of the polymerization (membrane) surface. In fact, this could be the reason why data from the lamellipodia experiments deviate from the line in Fig. 3E, in contrast to the data from the other three structures (this is discussed in some detail in the Supplement). Regarding the actin crosslinkers, the only role they play in our model is keeping the filaments connected in the bundle. As far as the question of why the longest filament in the actin cable is the one that specifies the length of the cable, this is addressed in more detail in our McInally et al., 2024 (PNAS) paper, where we measured cable length by segmenting the fluorescence signal of the cable. Therefore, the filaments in the bundle that extend the furthest define the reported length. Also, given the function of the cables for transporting vesicles, the furthest reach of the filaments in the bundle defines the area from which the vesicles are collected.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      An important result of the model proposed by the authors is that the relationship between bundle mean length and variance should also inform the number of filaments in the bundle (Equation 13). In the SI the authors thus predict from fitting experimental results that bundles should be made of around 173 filaments, which is larger than most values proposed in the literature (and quoted in this work), except for stereocilia. Can the authors comment on this?

      This is an interesting point that we have been thinking about. Indeed, the model does relate the number of filaments to the variance of the length, but this dependence is logarithmic and therefore insensitive to changes in the number of filaments. Consequently, the number 173 comes with very large error bars and should be thought of more like a few hundred filaments in terms of the precision with which we can extract this number from data. We make this point more clearly in the revised SI, where we now say that based on the data the best we can do is say that the number of filaments is between 80 and 400.

      Along the same lines, in their derivation of Equations 12 and 13 (a key result of the manuscript) the authors make some approximations that are only valid for large N (number of filaments in the bundle). Is this approximation valid for actin cables or filopodia, estimated to comprise only around 10 filaments?

      Indeed, even for N=10 filaments the approximate formulas have errors that are well below what can be measured. We consider the details of the approximation in deriving Equations 12 and 13 from the exact distribution (Equation 11) in the Supplemental section “Distribution of bundle lengths when individual filament lengths are exponentially distributed”. For example, the exact result involves the harmonic number which for N=10 is 2.88, while the approximate formula ln(N) + gamma we use yields 2.92, a fractional error that is < 2%.

      A key assumption of the model is that the bundle length corresponds to the maximum individual filament length inside the bundle. Couldn't bundles comprise several filaments one after another, head-to-tail? What do the authors expect then?

      Excellent point. Indeed, this is precisely the geometry of the yeast actin cable. In our previously published McInally et al., 2024 (PNAS) paper we worked out the math in that case and found that the main result about the variance holds. In this paper we presented a simpler, model that retains the same features of the one presented in the PNAS paper to better accentuate the origins of the scaling of the variance with the mean length, which is simply the result of bundling and identifying the length of the bundle with the length of the longest filament (or, more precisely, furthest extending filament) in the bundle.

      The model also allows us to relate the bundle length fluctuations and average to the individual filament characteristic length (Equations 12 and 13 again). Can the authors comment on the values of 〈l〉 they would obtain for experimental data?

      It is hard to give a precise number, as we would need to know also the number of filaments in the bundle, and for that we would need better electron microscopy data (which has proven difficult for the field to obtain). Still with typical numbers in the 10s to 100s the expected average filament lengths are roughly, ln(10) – ln(100), or 2-5 times smaller than the average bundle length.

      I find the Methods section a bit underwhelming. In particular, can the authors give more details on their treatment of experimental data? Bootstrapping sampling is mentioned but there is no information on the size of the original data sets, which could affect the validity of such a method.

      Thanks for the criticism. We have added details regarding the sizes of the data sets used in the analysis in the Methods section.

      Along the same lines, is the graph in Figure 1E the result of a simulation like the ones the authors used to obtain their result or is it just a schematic? If the first, I would suggest replacing it with an actual simulated length trajectory. In general, I think this work would benefit from more detailed explanations and examples of how stochastic trajectories were computed and analysed.

      This is also a good point. We still prefer to keep the schematic in this figure since our goal here is to define the question before we commence with computations and data analysis. The stochastic trajectories were generated using the standard Gillespi algorithm and the statistics of length were gathered once the dynamics of length reach steady state. We explain this in the Methods section and give more details in the Supplement.

      Finally, while I find the writing in this manuscript to be excellent, I think the figures require some work. The schematics and drawings, which are very low resolution, the font size for the axes, and the choice of colours all make it more cumbersome than necessary to understand what is being shown.

      Thank you for pointing this out. We have made better versions of the figures.

      Reviewer #2 (Recommendations For The Authors):

      "In this case, the length distribution of the bundle derived from extreme value statistics, leads to a peaked non-Gaussian distribution, even when filaments within the bundle are unregulated and exponentially distributed."

      You mention "extreme value statistics" only once, in the introduction. I would suggest that you come back to this notion and explain how your results connect to extreme value statistics or delete it from the manuscript.

      Good point. We added a sentence to draw the reader’s attention to the fact that our result is an extreme value distribution (Equation 11 is the Gumbel distribution) used in statistics of extreme events.

      This is a follow-up of one of my major points of criticism: Fig. 3A: why do you fit (if I understand correctly) the blue and orange data points with the same power law? For (A-- D) The data extend over less than an order of magnitude. Why is a power law fit appropriate? Can you quantify how much better your fits are compared to a linear dependence? Bundling the data of all structures yields a common matter curve (with the exception of filopodia). This is quite remarkable, I think, and merits some more discussion than currently given in the manuscript.

      Good point. We should have been more clear. In Figures 3A-D we show individual data sets for the different bundle structures and compare the prediction of the Balance Point Model (dashed line) to the data. We also do a fit to a power law to show that the data is consistent with the Bundle model. This comparison is made much more clear in Figure 3E.

      Fig 1B, right does not show the addition and removal of subunits - Fig. 1C does. Panel C is not explained in the caption. The second appearance of (D) in the caption could be omitted.

      Good points. We fixed these issues in the new version of the Figure and caption.

      "For individual actin filaments (...)" I found this and the following paragraph slightly confusing at first reading: as long as you write about single filaments, do you have annealing in mind, where two filaments merge and form a longer filament? In case you consider a bundle, do you consider a filament that is cross-linked to other filaments and thereby added to the bundle? Similarly for removing filament segments (severing or unbundling)? Probably, my confusion is a consequence of you seemingly using filament to describe bundles as well as single actin filaments.

      Sorry for the confusion. We tried to be consistent throughout the text and use “filament” to denote a single actin filament and “bundle” a collection of parallel filaments crosslinked together. The assembly and disassembly dynamics of the filaments in the bundle are only relevant to the extent that they affect the length distribution of individual filaments. The main result is largely independent of that (as demonstrated in the Supplement by considering different single filament distributions) once we decide that the length of the bundle is given by the length of the longest filament in the bundle. This is the point of extreme value statistics where a universal, Gumbel distribution for the length of the longest filament in the bundle arises independent of the length distribution of a single filament (this result is akin to the Central Limit Theorem which predicts a Gaussian distribution of the mean of a large number of random numbers irrespective of the distribution they’re drawn from.)

      In Figure 4D, the variance of the filopodia lengths" Probably Figure 3D?

      Yes. Thank you. We fixed this.

      "The filopodia data seemingly has the same slope (...) but with variances higher than what is measured for other actin structures." This finding does not contradict the main statement of a nonlinear scaling of the variance with the mean length, right? I therefore find this discussion slightly peripheral and also confusing. Also, what is the reason to assume that EM might get the actual length of filopodia wrong by a factor of 2 to 3?

      The issue with filopodia is that the way the lengths are measured is by the extent to which the structure as a whole protrudes from the cell. This leaves unresolved the lengths of the actual filaments in the structure, and we suspect that they are longer as they extend into the cytoplasm. This would contribute to the shift off the common curve in the direction that is observed (larger variance associated with smaller average length). We have no way to justify that this would lead to a 2-3 factor other than that would be enough to collapse the data onto the common curve. Clearly more careful experiments are needed to resolve the issue. We added some clarifying remarks to this effect into the discussion.

      Eq.(14) What is Z?

      Thanks for pointing out this omission. Z = L/<L> and we have added that in the formula where Z appears.

      LIST OF CHANGES

      Here we summarize the changes we made to the manuscript and the Supplementary material in response to the reviewers.

      (1) Fixed typo: Figure 1 legend had two parts labelled D which has been changed into a D and a C. The explanation of panel C has been added.

      (2) Fixed typo: The incorrect call to Figure 4D is now corrected to Figure 3D.

      (3) In the Supplementary material we made more precise our estimate of the number of filaments. The wording “From this we can estimate the number of filaments. We find, with a confidence interval of…” we have changed to “From this we can estimate the number of filaments to be between 80 and 400 which compares favourably to the typical number of filaments in the different actin structures that were analyzed.”

      (3) In the Methods section we added the number of measured filament lengths in the different data sets used in the analysis.

      (4) We made better (higher resolution) versions of all the Figures.

    1. eLife Assessment

      This valuable study explores changes in the Drosophila microbiome in response to environmental temperature over more than ten years. The evidence showing that temperature leads to diversification of bacterial clades is solid, but additional information would help clarify how subspecies competition impacts microbiome composition and the host. The work will interest researchers working with microbiomes, microbial ecology, and evolutionary biology.

    2. Reviewer #1 (Public review):

      Summary:

      The factors that create and maintain diversity in host-associated microbiomes remain poorly understood. A better understanding of these factors will help in the efforts to leverage the adaptive potential of the microbiome to help solve pressing problems in health and agriculture.

      Experimental evolution provides a promising path forward as we can track the causes and consequences in the emergence of novel variants, but experimental evolution remains underutilized in host-microbiome interactions. Here, Gracia-Alvira utilizes a long-term experimental evolution study in Drosophila simulans under hot and cold temperature regimes to identify strain-level variation in an important fly bacterium, Lactiplantibacillus plantarum. They identify three strains of L. plantarum, which are most prevalent in their respective three temperature regimes, suggesting that these are locally adapted bacteria. Then, using a combination of genomics, in vitro, and in vivo, Gracia-Alvira et al attempt to understand the factors that led to the differentiation of the hot and cold L. plantarum and their impacts on the fly host.

      Strengths:

      This is an excellent use of experimental evolution to track the emergence of novelty in the microbiome. The genomic analyses are all solid and appropriate for the data sets. It is especially striking that the comparisons with the other, independent experimental evolution studies in different labs (and across continents between Portugal and South Africa) show a consistent response to temperature. Many have disregarded the microbiome as it is something that is too sensitive to seemingly innocuous variables (particularly in the fly microbiome), such that we cannot find generalities. However, this finding highlights the potential for experimental evolution to uncover these dynamics. The question of how strains emerge and are maintained is timely and is one of the key open questions in host-microbiome evolution currently.

      Weaknesses:

      (1) The framing in the title and throughout the discussion about "subspecies competition" does not match the data that was collected. The subspecies competition requires actually tracking the competitive outcomes between the hot, cold, and unevolved L. plantarum. In the in vivo work, I can see that mixes of the strains were made, but they did not track whether the cold strain outcompeted the hot strain in vivo under cold conditions, for example. While Figure 4 is suggestive that there is ongoing competition in the hot temperature regime, this is not necessarily shown in the cold, which is dominated by the C clade. It could also be that the bacteria cannot survive in the flies at the different temperatures. The growth curve assays hint that the bacteria can grow, but the plate reader couldn't actually maintain the 18 {degree sign}C temperature (line 455). So all of this evidence is very indirect and insufficient to say that strain competition is driving these patterns.

      (2) The in vivo results are interesting in that there appears to be a fitness cost of clade C, but the explanation is underdeveloped. I say under-developed because in Figure 4, the cold L. plantarum remains much higher throughout adaptation to the hot temperature regime than the hot L. plantarum in the cold regime. The hot L. plantarum is low abundance throughout the cold regime. I felt like this observation was not explained, but it seems relevant to understanding the strain dynamics.

      I will also note that this is not the first time that L. plantarum or other Lactobacillus have been shown to exert fitness costs to Drosophila. Gould, PNAS, 2018, shows that both Lactobacillus plantarum and Lactobacillus brevis in mono-association have lower fitness (measured through Leslie matrix projections using lifespan and fecundity) than axenic flies. Many studies of wild Drosophila fail to find Lactobacillus, or it is low abundance (e.g., Chandler, PLoS Genetics, 2014; Wang, Environmental Microbiology Reports, 2018; Henry & Ayroles, Molecular Ecology, 2022; Gale, AEM, 2025). This might help provide useful context for the in vivo results.

      (3) The data in Figure 4 are compelling to focus on the L. plantarum variants. However, I can see from the methods that the competitive mapping included only other strains of Wolbachia. It is not clear how other members of the microbiome changed in response to the temperature regimes. As I note in point #2, given that Lactobacillus is often rare, it is not clear what the rest of the microbiome looks like over the course of adaptation. Indeed, it seems like Mazzucco & Schlotterer, PRSB, 2021 did a broader analysis of the microbiome and found that Acetobacter is by far the most common bacterium (I think this data is also part of the data shown here?). Expanding on why or why not in this context is important and will improve this study, particularly if the focus is on connecting these evolutionary dynamics to ecological competition to explain the emergence of strain diversity.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gracia-Alvira et al. investigated how environmental temperature affects competition among members of the microbiome, with a focus on intraspecific diversity, using the Drosophila model.

      Notably, the authors identified three clades of Lactiplantibacillus plantarum from a natural population of Drosophila simulans collected in Florida. They tracked the dynamics of these three bacterial clades under two temperature conditions over the course of more than ten years. Using comparative genomics and phylogeny, they showed that these three bacterial clades likely adapted to their host independently in a temperature-specific manner. Further, by combining in vitro culture and in vivo mono-association assays, they demonstrated the functional divergence of these three bacterial clades phenotypically, including their growth dynamics and effects on host fitness. Lastly, they performed pathway analysis and speculated on key genomic variance supporting such functional divergence.

      Strengths:

      The laboratory evolutionary experiment in response to cold or hot environmental temperature is impressive, given its more than ten years of experimental time period. This collection of achieved microbiome samples paired with the fly host data can be a valuable resource for the field.

      Weaknesses:

      The laboratory evolutionary experiment can be limited due to its artificial experimental setup. For example, wild flies rely on a more diverse set of food sources and are constantly exposed to new bacterial inoculations, whereas under laboratory conditions, flies live in a more restricted ecosystem. In addition, environmental temperatures differ among different locations, but they also involve seasonal changes within the same region. This manuscript can be strengthened with further discussions that elaborate on these limitations.

      Moreover, the extent of host effects involved in these experiments remains ambiguous, because it is unclear whether these Lactiplantibacillus plantarum mostly reside within fly guts or on Drosophila medium. The laboratory evolutionary experiment possibly favored better colonizers on Drosophila medium under either cold or hot temperatures, which subsequently can saturate fly guts. As fully dissociating these variables can be experimentally tedious, the authors may want to comment more on these aspects in the discussion. Or they may want to consider some measurements. For example, measuring the growth rate of these bacteria on Drosophila medium under different temperatures, in addition to the current MRS culture experiments, or measuring the portion of the Lactiplantibacillus on Drosophila medium versus these stably colonizing fly guts.

    4. Reviewer #3 (Public review):

      Summary:

      The study presents an analysis of 297 pangenomes derived from 20 populations of Drosophila simulans, at 19 time points for fast-reproducing individuals in a hot environment, or at 10 time points for slow-reproducing individuals in a cold environment, over a period of more than 10 years. The authors select a particular microbial component of the pangenomes and study the dynamics of Lactiplantibacillus plantarum strains in two environments. They discover that the revealed operational taxonomic units could be divided into three phylogenetic clades, which have their own genomic and genetic features, different adaptive capabilities that depend on the environment, and have a distinct impact on the fitness of the host.

      Strengths:

      The authors prove that bacterial microbiome components are sensitive to the environment and could rapidly (years) be fixed in eukaryotic populations. This study establishes a tractable model that potentially enables the study of variability of the physiological influence of distinct strains of an important commensal species, Lactiplantibacillus plantarum, on the Drsosophila host. It is clearly shown that this single species consists of several phylogenetically and functionally diverse strains. The authors did not limit their interest to their own model, but rather they have integrated a comparative approach by analysing phylogenetic relationships among 92 described L.plantarum strains.

      Overall, the study is novel and delivers important discoveries of a longitudinal, well-replicated experiment, generating a substantial amount of genomic data. It highlights an important dimension of research that environmental selection operates at the subspecies level.

      Weaknesses:

      Even though the authors show only one particular example by conducting their longitudinal experiment, they honestly acknowledge failures important for interpretation of the biological significance of the results (gnotobiotic mono-association experiments was done with D.melanogaster, but not D. simulans) and therefore they state limitations of their conclusions (weaker effects in the non-axenic flies are due to the presence of other taxa or to higher-order interactions with other members of the microbiome). These interactions could significantly affect bacterial growth, metabolism, and physiological influence on the host.

      The authors exploit the results of their experiment to speculate about a wide range of evolutionary phenomena, like within-species competition, ecological adaptation and evolution of the host, fitness advantage of bacteria to the host, the benefits of parasitism or mutualism, the domestication of the microbiome, etc. At the end, they conclude that their study "highlights that even subspecies diversity plays a key role in adaptation to environmental temperature". However, the potential mechanisms of such adaptation are barely discussed, so that the focus of the study shifts from the temperature-induced changes in microbial population structures toward metabolism-related adaptations of clade representatives that enable them to diversify their carbon and nitrogen sources. The role of the temperature factor remains elusive.

      In addition to that, the paper has a clearly minimalistic experimental approach to address functional properties of the revealed L.plantarum strains, so that their own fitness, or their relationship with the Drosophila host, is characterised superficially. Therefore, the authors' discourse can be speculative rather than factual (especially when the authors use the expression "likely" to share their guesses in the "Results" section). Nevertheless, these minor drawbacks do not underscore the novelty of the discovered phenotypes and the importance of their further investigation.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The factors that create and maintain diversity in host-associated microbiomes remain poorly understood. A better understanding of these factors will help in the efforts to leverage the adaptive potential of the microbiome to help solve pressing problems in health and agriculture.

      Experimental evolution provides a promising path forward as we can track the causes and consequences in the emergence of novel variants, but experimental evolution remains underutilized in host-microbiome interactions. Here, Gracia-Alvira utilizes a long-term experimental evolution study in Drosophila simulans under hot and cold temperature regimes to identify strain-level variation in an important fly bacterium, Lactiplantibacillus plantarum. They identify three strains of L. plantarum, which are most prevalent in their respective three temperature regimes, suggesting that these are locally adapted bacteria. Then, using a combination of genomics, in vitro, and in vivo, Gracia-Alvira et al attempt to understand the factors that led to the differentiation of the hot and cold L. plantarum and their impacts on the fly host.

      Strengths:

      This is an excellent use of experimental evolution to track the emergence of novelty in the microbiome. The genomic analyses are all solid and appropriate for the data sets. It is especially striking that the comparisons with the other, independent experimental evolution studies in different labs (and across continents between Portugal and South Africa) show a consistent response to temperature. Many have disregarded the microbiome as it is something that is too sensitive to seemingly innocuous variables (particularly in the fly microbiome), such that we cannot find generalities. However, this finding highlights the potential for experimental evolution to uncover these dynamics. The question of how strains emerge and are maintained is timely and is one of the key open questions in host-microbiome evolution currently.

      Weaknesses:

      (1) The framing in the title and throughout the discussion about "subspecies competition" does not match the data that was collected. The subspecies competition requires actually tracking the competitive outcomes between the hot, cold, and unevolved L. plantarum. In the in vivo work, I can see that mixes of the strains were made, but they did not track whether the cold strain outcompeted the hot strain in vivo under cold conditions, for example.

      We thank the reviewer for the honest concern and take this opportunity to defend our claim of "subspecies competition used across the manuscript. As the reviewer states, subspecies competition requires tracking the competitive outcomes between the three clades, and this is what we did by sampling and sequencing across ten years of experimental evolution (Figures 4 and S3). For this reason, we point that the subspecies competition assessment comes from the direct observation of changes in relative abundance across the time series, and not from the follow-up experiments in vivo or in vitro.

      While Figure 4 is suggestive that there is ongoing competition in the hot temperature regime, this is not necessarily shown in the cold, which is dominated by the C clade. It could also be that the bacteria cannot survive in the flies at the different temperatures. The growth curve assays hint that the bacteria can grow, but the plate reader couldn't actually maintain the 18 {degree sign}C temperature (line 455). So all of this evidence is very indirect and insufficient to say that strain competition is driving these patterns.

      We thank the reviewer for the alternative hypothesis that could explain the observed subspecies dynamic. We rule out that dominance of clade C in the cold occurs because the other two clades cannot grow in this regime based on three pieces of evidence:

      (1) In the time series, clades H and U decrease, but never disappear (Figures 4 and S3), even showing some peaks of abundance in specific replicate populations (Figure S3).

      (2) We isolated individuals belonging to clade H in the cold-evolved populations, as shown in figure 2. This is a direct evidence that clade H prevails in the cold-evolved populations, although in low abundance.

      (3) We did grow the three taxa in fly food petri dishes incubated at both temperature regimes, observing growth in all cases.

      We will include the food growth experiment in the revised manuscript as further supporting evidence for growth in both regimes.

      (2) The in vivo results are interesting in that there appears to be a fitness cost of clade C, but the explanation is underdeveloped. I say under-developed because in Figure 4, the cold L. plantarum remains much higher throughout adaptation to the hot temperature regime than the hot L. plantarum in the cold regime. The hot L. plantarum is low abundance throughout the cold regime. I felt like this observation was not explained, but it seems relevant to understanding the strain dynamics.

      We acknowledge that a strong fitness cost of clade C is observed in axenic D. melanogaster. In the native host, D. simulans, with reduced microbiome, we observed delayed development that could even be an advantage depending on the situation, as pointed out by reviewer 3 in the recommendations.

      Even if we assume that flies colonized with clade C are less fit in the experimental evolution, another caveat is whether the flies can actively select for the L. plantarum clade. Under this assumption, a clade that imposes a fitness cost to the fly (clade C) should be selected against over time because the flies colonized by this clade will have less offspring, or develop later than the rest. Alternatively, as the microbiome is shared among all the individuals in the population, the host might not be able to “purge” the pernicious clade, and L. plantarum dynamics might be controlled solely by the relative fitness between clades in the given experimental treatment. We will discuss this hypothesis in the revision as a way to explain the relationship between the abundance of each clade and the effect on the host.

      I will also note that this is not the first time that L. plantarum or other Lactobacillus have been shown to exert fitness costs to Drosophila. Gould, PNAS, 2018, shows that both Lactobacillus plantarum and Lactobacillus brevis in mono-association have lower fitness (measured through Leslie matrix projections using lifespan and fecundity) than axenic flies. Many studies of wild Drosophila fail to find Lactobacillus, or it is low abundance (e.g., Chandler, PLoS Genetics, 2014; Wang, Environmental Microbiology Reports, 2018; Henry & Ayroles, Molecular Ecology, 2022; Gale, AEM, 2025). This might help provide useful context for the in vivo results.

      We thank the reviewer for the references. These observations will be compared to our phenotypic results and discussed in the revised version of the manuscript.

      (3) The data in Figure 4 are compelling to focus on the L. plantarum variants. However, I can see from the methods that the competitive mapping included only other strains of Wolbachia.

      We appreciate the thorough reading of the methods by the reviewer. The competitive mapping comprised two steps: first we discarded the reads that mapped to Drosophila, Wolbachia and additional potential contaminants from sequencing facitilies (human, dog...). This step leaves the reads originated from whole the external microbiome of the flies, including L. plantarum. The second competitive mapping step recruits the reads that map any clade of L. plantarum.

      It is not clear how other members of the microbiome changed in response to the temperature regimes. As I note in point #2, given that Lactobacillus is often rare, it is not clear what the rest of the microbiome looks like over the course of adaptation. Indeed, it seems like Mazzucco & Schlotterer, PRSB, 2021 did a broader analysis of the microbiome and found that Acetobacter is by far the most common bacterium (I think this data is also part of the data shown here?). Expanding on why or why not in this context is important and will improve this study, particularly if the focus is on connecting these evolutionary dynamics to ecological competition to explain the emergence of strain diversity.

      We acknowledge that the rest of the Drosophila microbiome is not addressed in this study, as we wanted to focus the storyline around the intraspecific dynamics found in L. plantarum. We consider that a complete characterization of the whole Drosophila microbiome would unnecessarily elongate the paper and thus we treat it as a constant biotic factor.

      We must point out that our dataset is not the one reported by Mazzucco & Schlötterer, which was done in D. melanogaster, rather than D. simulans. Nevertheless, both experiments share the same infrastructure, temperature regimes and fly maintenance.

      We will include a list of taxa that were isolated from the populations, as well as to report L. plantarum prevalence and abundance across the experiment in order to provide context of the microbiome, beyond L. plantarum, to the readership.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript, Gracia-Alvira et al. investigated how environmental temperature affects competition among members of the microbiome, with a focus on intraspecific diversity, using the Drosophila model. Notably, the authors identified three clades of Lactiplantibacillus plantarum from a natural population of Drosophila simulans collected in Florida. They tracked the dynamics of these three bacterial clades under two temperature conditions over the course of more than ten years. Using comparative genomics and phylogeny, they showed that these three bacterial clades likely adapted to their host independently in a temperature-specific manner. Further, by combining in vitro culture and in vivo mono-association assays, they demonstrated the functional divergence of these three bacterial clades phenotypically, including their growth dynamics and effects on host fitness. Lastly, they performed pathway analysis and speculated on key genomic variance supporting such functional divergence.

      Strengths:

      The laboratory evolutionary experiment in response to cold or hot environmental temperature is impressive, given its more than ten years of experimental time period. This collection of achieved microbiome samples paired with the fly host data can be a valuable resource for the field.

      Weaknesses:

      The laboratory evolutionary experiment can be limited due to its artificial experimental setup. For example, wild flies rely on a more diverse set of food sources and are constantly exposed to new bacterial inoculations, whereas under laboratory conditions, flies live in a more restricted ecosystem. In addition, environmental temperatures differ among different locations, but they also involve seasonal changes within the same region. This manuscript can be strengthened with further discussions that elaborate on these limitations.

      As the reviewer has correctly noted, our experimental setting is not exempt from limitations. Lab-reared flies are fed with a defined standard diet. Furthermore, although the system is not completely close to bacterial migration, this is limited as replicate populations are not allowed to mix during the maintenance of the flies. For this reason, we consider our laboratory setting as a compromise between observing wild populations, which undergo all biotic and abiotic stresses but cannot be manipulated, and evolving the bacteria in absence of the host, or in gnobiotic hosts, in which biotic interactions are not fully considered. We will extend on this in the new version of the manuscript.

      Moreover, the extent of host effects involved in these experiments remains ambiguous, because it is unclear whether these Lactiplantibacillus plantarum mostly reside within fly guts or on Drosophila medium. The laboratory evolutionary experiment possibly favored better colonizers on Drosophila medium under either cold or hot temperatures, which subsequently can saturate fly guts. As fully dissociating these variables can be experimentally tedious, the authors may want to comment more on these aspects in the discussion. Or they may want to consider some measurements. For example, measuring the growth rate of these bacteria on Drosophila medium under different temperatures, in addition to the current MRS culture experiments, or measuring the portion of the Lactiplantibacillus on Drosophila medium versus these stably colonizing fly guts.

      The reviewer's point was briefly addressed in the Results chapter: "Phenotypic differences in liquid culture".

      Reviewer #3 (Public review):

      Summary:

      The study presents an analysis of 297 pangenomes derived from 20 populations of Drosophila simulans, at 19 time points for fast-reproducing individuals in a hot environment, or at 10 time points for slow-reproducing individuals in a cold environment, over a period of more than 10 years. The authors select a particular microbial component of the pangenomes and study the dynamics of Lactiplantibacillus plantarum strains in two environments. They discover that the revealed operational taxonomic units could be divided into three phylogenetic clades, which have their own genomic and genetic features, different adaptive capabilities that depend on the environment, and have a distinct impact on the fitness of the host.

      Strengths:

      The authors prove that bacterial microbiome components are sensitive to the environment and could rapidly (years) be fixed in eukaryotic populations. This study establishes a tractable model that potentially enables the study of variability of the physiological influence of distinct strains of an important commensal species, Lactiplantibacillus plantarum, on the Drsosophila host. It is clearly shown that this single species consists of several phylogenetically and functionally diverse strains. The authors did not limit their interest to their own model, but rather they have integrated a comparative approach by analysing phylogenetic relationships among 92 described L.plantarum strains.

      Overall, the study is novel and delivers important discoveries of a longitudinal, well-replicated experiment, generating a substantial amount of genomic data. It highlights an important dimension of research that environmental selection operates at the subspecies level.

      Weaknesses:

      Even though the authors show only one particular example by conducting their longitudinal experiment, they honestly acknowledge failures important for interpretation of the biological significance of the results (gnotobiotic mono-association experiments was done with D.melanogaster, but not D. simulans) and therefore they state limitations of their conclusions (weaker effects in the non-axenic flies are due to the presence of other taxa or to higher-order interactions with other members of the microbiome). These interactions could significantly affect bacterial growth, metabolism, and physiological influence on the host.

      We agree with the reviewer in that the use gnobiotic animals is a limitation, as by "tuning" the flies' microbiome we are modifying the interactions between members, which can potentially change the phenotypic outcome. Nevertheless, we use it as a complementary approach, rather than the only inference in our study.

      The authors exploit the results of their experiment to speculate about a wide range of evolutionary phenomena, like within-species competition, ecological adaptation and evolution of the host, fitness advantage of bacteria to the host, the benefits of parasitism or mutualism, the domestication of the microbiome, etc. At the end, they conclude that their study "highlights that even subspecies diversity plays a key role in adaptation to environmental temperature". However, the potential mechanisms of such adaptation are barely discussed, so that the focus of the study shifts from the temperature-induced changes in microbial population structures toward metabolism-related adaptations of clade representatives that enable them to diversify their carbon and nitrogen sources. The role of the temperature factor remains elusive.

      We acknowledge that our study does not fully resolve the mechanism by which a different clade ends up dominating each temperature regime. The MRS liquid experiment was an attempt to answer whether differences in optimal growth temperature could explain the temperature-specific abundance of the two clades. Our experiments showed, however, thatthis was not the case. Beyond this point, it is hard to disentangle the role of the temperature, as it could also act indirectly on the bacteria, for example, through the host or the food.

      A second observation in our time series was that a third clade, U, was unfit in both regimes despite starting the experiment in high abundance. For this reason we also studied what made this clade less fit. Based on our analyses, we propose that the decrease of clade U was driven by the shift to a laboratory diet, shared by all experimental populations.

      In addition to that, the paper has a clearly minimalistic experimental approach to address functional properties of the revealed L.plantarum strains, so that their own fitness, or their relationship with the Drosophila host, is characterised superficially. Therefore, the authors' discourse can be speculative rather than factual (especially when the authors use the expression "likely" to share their guesses in the "Results" section). Nevertheless, these minor drawbacks do not underscore the novelty of the discovered phenotypes and the importance of their further investigation.

      We consider the reviewer's concern and will tone down the phrasing when reporting our findings in the revised version of the manuscript.

    1. eLife Assessment

      This important work demonstrates the role of physically linking the core and CTD kinase modules of TFIIH via separate domains of subunit Tfb3 in confining RNA Polymerase II Serine 5 CTD phosphorylation to promoter regions of transcribed genes in budding yeast. The main findings, resulting from analyses of viable Tfb3 mutants in which the linkage between TFIIH core and kinase modules has been severed, are supported by solid evidence from in vitro and in vivo experiments. The new findings raise the intriguing possibility that the Tfb3-mediated connection between core and kinase modules of TFIIH is an evolutionary addition to an ancestral state of physically unconnected enzymes.

    2. Reviewer #1 (Public review):

      Giordano et al. demonstrate that yeast cells expressing separated N- and C-terminal regions of Tfb3 are viable and grow well. Using this creative and powerful tool, the authors effectively uncouple CTD Ser5 phosphorylation at promoters and assess its impact on transcription. This strategy is complementary to previous approaches, such as Kin28 depletion or the use of CDK7 inhibitors. The results are largely consistent with earlier studies, reinforcing the importance of the Tfb3 linkage in mediating CTD Ser5 phosphorylation at promoters and subsequent transcription.

      Notably, the authors also observe effects attributable to the Tfb3 linker itself, beyond its role as a simple physical connection between the N- and C-terminal domains. These findings provide functional insight into the Tfb3 linker, which had previously been observed in structural studies but lacked clear functional relevance. Overall, I am very positive about the publication of this manuscript and offer a few minor comments below that may help to further strengthen the study.

      Page 4 PIC structures show the linker emerging from the N-terminal domain as a long alpha-helix running along the interface between the two ATPase subunits, followed by a turn and a short stretch of helix just N-terminal to a disordered region that connects to the C-terminal region (see schematic in Fig. 1A).

      The linker helix was only observed in the poised PIC (Abril-Garrido et al., 2023), not other fully-engaged PIC structures.

      Page 8 Recent structures (reviewed in (Yu et al., 2023)) show that the Kinase Module would block interactions between the Core Module and other NER factors. Therefore, TFIIH either enters into the NER complex as free Core Module, or the Kinase Module must dissociate soon after.

      To my knowledge, this is still controversial in the NER field. I note the potential function on the kinase module is likely attributed to the N-terminal region of Tfb3 through its binding to Rad3. Because the yeast strains used in Fig. 6 retain the N-terminal region of Tfb3, the UV sensitivity assay presented here is unlikely to directly address the contribution of the kinase module to NER.

      Page 11. Notably, release of the Tfb3 Linker contact also results in the long alpha-helix becoming disordered (Abril-Garrido et al., 2023), which could allow the kinase access to a far larger radius of area. This flexibility could help the kinase reach both proximal and distal repeats within the CTD, which can theoretically extend quite far from the RNApII body.

      Although the kinase module was resolved at low resolution in all PIC-Mediator structures, these structural studies consistently reveal the same overall positioning of the kinase module on Mediator, indicating that its localization is constrained rather than variable. This observation suggests that the linker region may help position the kinase module at this specific site, likely through direct interactions with the PIC or Mediator. This idea is further supported by numerous cross-links between the linker region and Mediator (Robinson et al., 2016).

      Comments on revisions:

      Revised ms clarified all my points, including those I previously misunderstood.